[
https://issues.apache.org/jira/browse/HUDI-1425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17376973#comment-17376973
]
ASF GitHub Bot commented on HUDI-1425:
--------------------------------------
vinothchandar commented on pull request #2296:
URL: https://github.com/apache/hudi/pull/2296#issuecomment-876078470
I suggest the following approach here
- Allow empty commit and make the code work even if there is an empty
commit. I think it should be fine.
- Introduce a flag that avoids the commit, if commit stats is empty and turn
it on for the spark datasource writer path, as an optimization.
On this, @garyli1019
> This seems like a bug introduced here...
https://github.com/apache/hudi/pull/1121/files
> We should definitely return if the incoming record is empty. cc:
@vinothchandar WDYT?
Agree. I think it does return actually (scala does not need these return),
if you notice the block finally returns like this . Good to confirm though
https://github.com/apache/hudi/blob/16e90d30eaa14e5c1c4632ad0a90497df601c637/hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala#L196
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
> Performance loss with the additional hoodieRecords.isEmpty() in
> HoodieSparkSqlWriter#write
> ------------------------------------------------------------------------------------------
>
> Key: HUDI-1425
> URL: https://issues.apache.org/jira/browse/HUDI-1425
> Project: Apache Hudi
> Issue Type: Improvement
> Components: Spark Integration
> Affects Versions: 0.9.0
> Reporter: pengzhiwei
> Assignee: pengzhiwei
> Priority: Blocker
> Labels: pull-request-available
> Fix For: 0.9.0
>
> Attachments: 截屏2020-11-30 下午9.47.55.png
>
>
> Currently in HoodieSparkSqlWriter#write, there is a _isEmpty()_ test for
> _hoodieRecords._ This may be a heavy operator in the case when the
> _hoodieRecords_ contains complex RDD operate.
> !截屏2020-11-30 下午9.47.55.png|width=1255,height=161!
> IMO this test does nothing to do with the performance improve,but rather
> affects performance.
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)