[ 
https://issues.apache.org/jira/browse/SPARK-8007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14630820#comment-14630820
 ] 

Joseph Batchik commented on SPARK-8007:
---------------------------------------

[~rxin] Reynold, I start adding virtual columns to the DataFrames and SQL 
queries for SPARK-8003 and SPARK-8007. My initial code is here: 
https://github.com/JDrit/spark/commit/e34d3a7eabbc9c41c2dd85b128b2bb5713039e40.

The one issue I ran into though was that the catalyst package cannot access 
org.apache.spark.sql.execution.expressions where SparkPartitionID resides. For 
prototyping purposes I copied SparkPartitionID to the catalyst package, but am 
wondering what would be the best way to deal with that dependency,  

Can you let me know what you think about my changes and what else needs to be 
done to it.

> Support resolving virtual columns in DataFrames
> -----------------------------------------------
>
>                 Key: SPARK-8007
>                 URL: https://issues.apache.org/jira/browse/SPARK-8007
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>            Reporter: Reynold Xin
>
> Create the infrastructure so we can resolve df("SPARK__PARTITION__ID") to 
> SparkPartitionID expression.
> A cool use case is to understand physical data skew:
> {code}
> df.groupBy("SPARK__PARTITION__ID").count()
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to