[
https://issues.apache.org/jira/browse/KYLIN-5121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17442618#comment-17442618
]
ASF GitHub Bot commented on KYLIN-5121:
---------------------------------------
zhengshengjun commented on pull request #1767:
URL: https://github.com/apache/kylin/pull/1767#issuecomment-966929049
LGTM
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
> Make JobMetricsUtils.collectMetrics be working again
> ----------------------------------------------------
>
> Key: KYLIN-5121
> URL: https://issues.apache.org/jira/browse/KYLIN-5121
> Project: Kylin
> Issue Type: Improvement
> Reporter: hujiahua
> Priority: Major
>
> At present, the rowCount needs to be eval after the cube built every time,
> and spark `QueryExecution` metric have `numOutputRows` metric for this
> purpose. But, after patch KYLIN-4662 (Migrate from third-party Spark to
> offical Apache Spark), the util function `JobMetricsUtils.collectMetrics`
> becomes out of working. Each rowCount needs to call `Dataset.count()`, which
> wastes resources and affects the cube build time.
> Here is my solution: Get the QueryExecution object based on custom
> QueryExecutionListener, and match the corresponding QueryExecution by
> comparing the output path. (BWT, The output path of cube id is always unique)
--
This message was sent by Atlassian Jira
(v8.20.1#820001)