Jiang Qiqi created SPARK-17237:
----------------------------------

             Summary: DataFrame fill after pivot causing 
org.apache.spark.sql.AnalysisException
                 Key: SPARK-17237
                 URL: https://issues.apache.org/jira/browse/SPARK-17237
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 2.0.0
            Reporter: Jiang Qiqi


I am trying to run a pivot transformation which I ran on a spark1.6 cluster, 
namely

sc.parallelize(Seq((2,3,4), (3,4,5))).toDF("a", "b", "c")
res1: org.apache.spark.sql.DataFrame = [a: int, b: int, c: int]

scala> res1.groupBy("a").pivot("b").agg(count("c"), avg("c")).na.fill(0)
res2: org.apache.spark.sql.DataFrame = [a: int, 3_count(c): bigint, 3_avg(c): 
double, 4_count(c): bigint, 4_avg(c): double]

scala> res1.groupBy("a").pivot("b").agg(count("c"), avg("c")).na.fill(0).show
+---+----------+--------+----------+--------+
|  a|3_count(c)|3_avg(c)|4_count(c)|4_avg(c)|
+---+----------+--------+----------+--------+
|  2|         1|     4.0|         0|     0.0|
|  3|         0|     0.0|         1|     5.0|
+---+----------+--------+----------+--------+

after upgrade the environment to spark2.0, got an error while executing 
.na.fill method

scala> sc.parallelize(Seq((2,3,4), (3,4,5))).toDF("a", "b", "c")
res3: org.apache.spark.sql.DataFrame = [a: int, b: int ... 1 more field]

scala> res3.groupBy("a").pivot("b").agg(count("c"), avg("c")).na.fill(0)
org.apache.spark.sql.AnalysisException: syntax error in attribute name: 
`3_count(`c`)`;
  at 
org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute$.e$1(unresolved.scala:103)
  at 
org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute$.parseAttributeName(unresolved.scala:113)
  at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveQuoted(LogicalPlan.scala:168)
  at org.apache.spark.sql.Dataset.resolve(Dataset.scala:218)
  at org.apache.spark.sql.Dataset.col(Dataset.scala:921)
  at 
org.apache.spark.sql.DataFrameNaFunctions.org$apache$spark$sql$DataFrameNaFunctions$$fillCol(DataFrameNaFunctions.scala:411)
  at 
org.apache.spark.sql.DataFrameNaFunctions$$anonfun$2.apply(DataFrameNaFunctions.scala:162)
  at 
org.apache.spark.sql.DataFrameNaFunctions$$anonfun$2.apply(DataFrameNaFunctions.scala:159)
  at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
  at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
  at 
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
  at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
  at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
  at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186)
  at 
org.apache.spark.sql.DataFrameNaFunctions.fill(DataFrameNaFunctions.scala:159)
  at 
org.apache.spark.sql.DataFrameNaFunctions.fill(DataFrameNaFunctions.scala:149)
  at 
org.apache.spark.sql.DataFrameNaFunctions.fill(DataFrameNaFunctions.scala:134)






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to