[
https://issues.apache.org/jira/browse/SPARK-20703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16009977#comment-16009977
]
Tejas Patil commented on SPARK-20703:
-------------------------------------
[~viirya] :
- Would this new operator be a physical plan node ? ie. `SparkPlan` ? One of
the limitations of current approach of using `RunnableCommand` is that it does
not allow defining partitioning + sorting requirements of the child nodes. I
have a local WIP patch for changing that for Hive insertions (as per [0], I
needed that for hive bucketing support) but seems like your work will be a
superset of that.
- For metrics: size of data written out (compressed and uncompressed), number
of files written out could be of good value. I agree that not all impls would
give this data (however num files seems low hanging fruit).
[0]:
https://issues.apache.org/jira/browse/SPARK-19256?focusedCommentId=15990618&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15990618
> Add an operator for writing data out
> ------------------------------------
>
> Key: SPARK-20703
> URL: https://issues.apache.org/jira/browse/SPARK-20703
> Project: Spark
> Issue Type: New Feature
> Components: SQL
> Affects Versions: 2.2.0
> Reporter: Reynold Xin
>
> We should add an operator for writing data out. Right now in the explain plan
> / UI there is no way to tell whether a query is writing data out, and also
> there is no way to associate metrics with data writes. It'd be tremendously
> valuable to do this for adding metrics and for visibility.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]