[ 
https://issues.apache.org/jira/browse/SPARK-8645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Zhulenev closed SPARK-8645.
----------------------------------
    Resolution: Won't Fix

> Incorrect expression analysis with Hive
> ---------------------------------------
>
>                 Key: SPARK-8645
>                 URL: https://issues.apache.org/jira/browse/SPARK-8645
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.3.0
>         Environment: CDH 5.4.2 1.3.0
>            Reporter: Eugene Zhulenev
>              Labels: dataframe
>
> When using DataFrame backed by Hive table groupBy with agg can't resolve 
> column if I pass them by String and not Column:
> This fails with: org.apache.spark.sql.AnalysisException: expression 'dt' is 
> neither present in the group by, nor is it an aggregate function.
> {code}
> val grouped = eventLogHLL
>       .groupBy(dt, ad_id, site_id).agg(
>         dt,
>         ad_id,
>         col(site_id)             as site_id,
>         sum(imp_count)           as imp_count,
>         sum(click_count)         as click_count
>       )
> {code}
> This works fine:
> {code}
>   val grouped = eventLogHLL
>       .groupBy(col(dt), col(ad_id), col(site_id)).agg(
>         col(dt)                        as dt,
>         col(ad_id)                     as ad_id,
>         col(site_id)                   as site_id,
>         sum(imp_count)                 as imp_count,
>         sum(click_count)               as click_count
>       )
> {code}
> Integration tests running with "embedded" spark and DataFrames generated from 
> RDD works fine.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to