[
https://issues.apache.org/jira/browse/SPARK-19519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15927979#comment-15927979
]
Hyukjin Kwon commented on SPARK-19519:
--------------------------------------
Do you mind if I ask self-reproducer? It seems the provided details are pretty
much dependent on the original data. I am willing to help and verify.
BTW, it does not look like a {{Blocker}}.
> Groupby for multiple columns not working
> ----------------------------------------
>
> Key: SPARK-19519
> URL: https://issues.apache.org/jira/browse/SPARK-19519
> Project: Spark
> Issue Type: Bug
> Components: Java API
> Affects Versions: 1.5.0
> Reporter: Faisal
> Priority: Blocker
>
> Please look at the below join between multiple dataframes, then while
> applying groupby function for the multiple columns for the aggregate max
> does not yield results instead exception User class threw exception:
> org.apache.spark.sql.AnalysisException: expression 'propVal' is neither
> present in the group by, nor is it an aggregate function. Add to group by or
> wrap in first() if you don't care which value you get.
> DataFrame joinModCtypeAsgns = modCtypeAsgnsDf.as("mod")
> .join(moduleCodeDf.as("mc"),
> moduleCodeDf.col("EntityCode").equalTo(modCtypeAsgnsDf.col("charValCode")))
> .join(dictDfCharCode.as("dc"),
> dictDfCharCode.col("EntityCode").equalTo(modCtypeAsgnsDf.col("charCode")))
> .join(dictDfIsAChar,
> dictDfIsAChar.col("EntityCode").equalTo(modCtypeAsgnsDf.col("charCode")));
>
> joinModCtypeAsgns.select(col("mc.propVal").as("mcaModCode"),
> col("dc.propVal").as("mcaCtypeCode"),
> max(col("mod.updatedDate")).as("mcaLastChangedDate"),
> coalesce(max(when(col("mndtryInd").equalTo("Y"), "Y")),
> max(when(col("mndtryInd").equalTo("N"), "N")),
> max(col("mndtryInd"))).as("mcaMandatoryFlg"),
> lit("N").as("mcaLockedFlg"),
> coalesce(max(when(col("fldColInd").equalTo("Y"), "F")),
> max(when(col("fldColInd").equalTo("N"),
> "I")),max(col("fldColInd"))).as("mcaFieldCollectionFlg"))
> .groupBy(col("mc.propVal"),col("dc.propVal")).agg(col("mc.propVal"),col("dc.propVal"),max(col("mod.updatedDate")));
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]