[
https://issues.apache.org/jira/browse/SPARK-8007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14631673#comment-14631673
]
Joseph Batchik commented on SPARK-8007:
---------------------------------------
Reynold, thanks for pointing that out. I updated the commit to use what you
suggested. This should also make it easy to add other virtual columns as
described in the parent ticket. All that should need to be done is updating the
resolver in the logical plan and the new virtual column rule.
https://github.com/JDrit/spark/commit/7b46e7de6f98df98480fa34c85248aa2d90bc635#diff-d74f782d414a74eee09a4b6b9994be87R34
> Support resolving virtual columns in DataFrames
> -----------------------------------------------
>
> Key: SPARK-8007
> URL: https://issues.apache.org/jira/browse/SPARK-8007
> Project: Spark
> Issue Type: Sub-task
> Components: SQL
> Reporter: Reynold Xin
>
> Create the infrastructure so we can resolve df("SPARK__PARTITION__ID") to
> SparkPartitionID expression.
> A cool use case is to understand physical data skew:
> {code}
> df.groupBy("SPARK__PARTITION__ID").count()
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]