[ 
https://issues.apache.org/jira/browse/SPARK-21669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin resolved SPARK-21669.
---------------------------------
       Resolution: Fixed
         Assignee: Adrian Ionescu
    Fix Version/s: 2.3.0

> Internal API for collecting metrics/stats during FileFormatWriter jobs
> ----------------------------------------------------------------------
>
>                 Key: SPARK-21669
>                 URL: https://issues.apache.org/jira/browse/SPARK-21669
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 2.3.0
>            Reporter: Adrian Ionescu
>            Assignee: Adrian Ionescu
>             Fix For: 2.3.0
>
>
> It would be useful to have some infrastructure in place for collecting custom 
> metrics or statistics on data on the fly, as it is being written to disk.
> This was inspired by the work in SPARK-20703, which added simple metrics 
> collection for data write operations, such as {{numFiles}}, 
> {{numPartitions}}, {{numRows}}. Those metrics are first collected on the 
> executors and then sent to the driver, which aggregates and posts them as 
> updates to the {{SQLMetrics}} subsystem.
> The above can be generalized and turned into a pluggable interface, which in 
> the future could be used for other purposes: e.g. automatic maintenance of 
> cost-based optimizer (CBO) statistics during "INSERT INTO <table> SELECT ..." 
> operations, such that users won't need to explicitly call "ANALYZE TABLE 
> <table> COMPUTE STATISTICS" afterwards anymore, thus avoiding an extra 
> full-table scan.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to