[ 
https://issues.apache.org/jira/browse/SPARK-19519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15927979#comment-15927979
 ] 

Hyukjin Kwon commented on SPARK-19519:
--------------------------------------

Do you mind if I ask self-reproducer? It seems the provided details are pretty 
much dependent on the original data. I am willing to help and verify.

BTW, it does not look like a {{Blocker}}. 

> Groupby for multiple columns not working
> ----------------------------------------
>
>                 Key: SPARK-19519
>                 URL: https://issues.apache.org/jira/browse/SPARK-19519
>             Project: Spark
>          Issue Type: Bug
>          Components: Java API
>    Affects Versions: 1.5.0
>            Reporter: Faisal
>            Priority: Blocker
>
> Please look at the below join between multiple dataframes, then while 
> applying  groupby function for the multiple columns for the aggregate max 
> does not yield results instead exception User class threw exception: 
> org.apache.spark.sql.AnalysisException: expression 'propVal' is neither 
> present in the group by, nor is it an aggregate function. Add to group by or 
> wrap in first() if you don't care which value you get.
>  DataFrame joinModCtypeAsgns = modCtypeAsgnsDf.as("mod")
>                       .join(moduleCodeDf.as("mc"), 
> moduleCodeDf.col("EntityCode").equalTo(modCtypeAsgnsDf.col("charValCode")))
>                       .join(dictDfCharCode.as("dc"), 
> dictDfCharCode.col("EntityCode").equalTo(modCtypeAsgnsDf.col("charCode")))
>                       .join(dictDfIsAChar, 
> dictDfIsAChar.col("EntityCode").equalTo(modCtypeAsgnsDf.col("charCode")));
>                       
>         joinModCtypeAsgns.select(col("mc.propVal").as("mcaModCode"),
>                       col("dc.propVal").as("mcaCtypeCode"),
>                       max(col("mod.updatedDate")).as("mcaLastChangedDate"),
>                       coalesce(max(when(col("mndtryInd").equalTo("Y"), "Y")),
>                          max(when(col("mndtryInd").equalTo("N"), "N")),
>                          max(col("mndtryInd"))).as("mcaMandatoryFlg"),
>                        lit("N").as("mcaLockedFlg"),
>                        coalesce(max(when(col("fldColInd").equalTo("Y"), "F")),
>                          max(when(col("fldColInd").equalTo("N"), 
> "I")),max(col("fldColInd"))).as("mcaFieldCollectionFlg"))
> .groupBy(col("mc.propVal"),col("dc.propVal")).agg(col("mc.propVal"),col("dc.propVal"),max(col("mod.updatedDate")));



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to