[
https://issues.apache.org/jira/browse/HUDI-1348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17397686#comment-17397686
]
ASF GitHub Bot commented on HUDI-1348:
--------------------------------------
hudi-bot edited a comment on pull request #2210:
URL: https://github.com/apache/hudi/pull/2210#issuecomment-862028641
<!--
Meta data
{
"version" : 1,
"metaDataEntries" : [ {
"hash" : "b845e34d11e4e44e2b41e2089349baddc3a10b80",
"status" : "DELETED",
"url" :
"https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=210",
"triggerID" : "b845e34d11e4e44e2b41e2089349baddc3a10b80",
"triggerType" : "PUSH"
}, {
"hash" : "a174c4ed2b4c13a032a38afdb0a21b58a7b6cf25",
"status" : "FAILURE",
"url" :
"https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1668",
"triggerID" : "a174c4ed2b4c13a032a38afdb0a21b58a7b6cf25",
"triggerType" : "PUSH"
} ]
}-->
## CI report:
* a174c4ed2b4c13a032a38afdb0a21b58a7b6cf25 Azure:
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1668)
<details>
<summary>Bot commands</summary>
@hudi-bot supports the following commands:
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
</details>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
> Provide option to clean up DFS sources after each commit
> --------------------------------------------------------
>
> Key: HUDI-1348
> URL: https://issues.apache.org/jira/browse/HUDI-1348
> Project: Apache Hudi
> Issue Type: Improvement
> Components: DeltaStreamer, Utilities
> Reporter: Vu Ho
> Priority: Major
> Labels: pull-request-available, user-support-issues
>
> Since DeltaStreamer makes heavily use of file listing, if the source contains
> a lot of tiny files, this could quickly become a bottle neck. We need a way
> to delete/archive files once processed by DeltaStreamer.
> It seems like the best way to reliably clean up the source is after DeltaSync
> commit the checkpoint successfully. We could add a new public method to
> Source e.g. `postCommit()` and invoke it after each successful commit
> Reference:
> [https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#input-sources]
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)