[ 
https://issues.apache.org/jira/browse/KYLIN-5121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hujiahua updated KYLIN-5121:
----------------------------
    Description: 
At present, the rowCount needs to be eval after the cube built every time, and 
spark `QueryExecution` metric have `numOutputRows` metric for this purpose. 
But, after patch KYLIN-4662 (Migrate from third-party Spark to offical Apache 
Spark), the util function `JobMetricsUtils.collectMetrics` becomes out of 
working. Each rowCount needs to call `Dataset.count()`, which wastes resources 
and affects the cube build time.

Here is my solution: Get the QueryExecution object based on custom 
QueryExecutionListener, and match the corresponding QueryExecution by comparing 
the output path. (BWT, The output path of cube id is always unique)

  was:
At present, the rowCount needs to be eval after the cube built every time, and 
spark `QueryExecution` metric have `numOutputRows` metric for this purpose. 
But, after patch KYLIN-4662 (Migrate from third-party Spark to offical Apache 
Spark), the util function `JobMetricsUtils.collectMetrics` becomes out of 
working. Each rowCount needs to call `Dataset.count()`, which wastes resources 
and affects the cube build time.



> Make JobMetricsUtils.collectMetrics be working again
> ----------------------------------------------------
>
>                 Key: KYLIN-5121
>                 URL: https://issues.apache.org/jira/browse/KYLIN-5121
>             Project: Kylin
>          Issue Type: Improvement
>            Reporter: hujiahua
>            Priority: Major
>
> At present, the rowCount needs to be eval after the cube built every time, 
> and spark `QueryExecution` metric have `numOutputRows` metric for this 
> purpose. But, after patch KYLIN-4662 (Migrate from third-party Spark to 
> offical Apache Spark), the util function `JobMetricsUtils.collectMetrics` 
> becomes out of working. Each rowCount needs to call `Dataset.count()`, which 
> wastes resources and affects the cube build time.
> Here is my solution: Get the QueryExecution object based on custom 
> QueryExecutionListener, and match the corresponding QueryExecution by 
> comparing the output path. (BWT, The output path of cube id is always unique)



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to