[
https://issues.apache.org/jira/browse/FLINK-29617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17616855#comment-17616855
]
luoyuxia edited comment on FLINK-29617 at 10/13/22 9:07 AM:
------------------------------------------------------------
[~dangshazi] Thanks for raising it and detail explanation. I'll be much
appreciated that you can take the ticket. If you don't have time, maybe I can
help take it.
I'm fine with these two suggestions. But prefer suggestion 2 since suggestion 1
will bring new option which user may hardly know it.
I have one question, have you ever tried with these suggestions? If so, what's
the improvement of these two suggestions?
Btw, the images uploaded is failed. Could you please upload them again?
was (Author: luoyuxia):
[~dangshazi] Thanks for raising it and detail explanation. I'll be much
appreciated that you can take the ticket. If you don't have time, maybe I can
help take it.
I'm fine with these two suggestions. But prefer suggestion 2 since suggestion 1
will bring new option which user may hardly know it.
I have one question, have you ever tried with these suggestions? If so, what's
the improvement of these two suggestions?
Btw, the images uploaded is . Could you please upload them again?
> Cost too much time to start SourceCoordinator of hdfsFileSource when start
> JobMaster
> ------------------------------------------------------------------------------------
>
> Key: FLINK-29617
> URL: https://issues.apache.org/jira/browse/FLINK-29617
> Project: Flink
> Issue Type: Improvement
> Components: Connectors / FileSystem, Runtime / Coordination
> Affects Versions: 1.15.2
> Reporter: LI Mingkun
> Priority: Major
> Labels: coordination, file-system
>
> h1. Scenario:
> Our user use flink batch to compact small files in one day. Flink version :
> 1.15
> He split pipeline into 24 for each hour. So there are 24 source
>
> I find it costs too much time to start SourceCoordinator of hdfsFileSource
> when start JobMaster
>
> as follow:
>
> !https://mail.google.com/mail/u/0?ui=2&ik=488d9ac3dd&attid=0.1&permmsgid=msg-a:r-3013789195315215531&th=183cb292e567fd9f&view=fimg&fur=ip&sz=s0-l75-ft&attbid=ANGjdJ9SVAoAslMUGQdVQJ_ccmEf4LxhaONYKJvS_V8nvijvT3JXw_VlyRBAEE9EQhTtWdYPa4TLCO5rxjXGrTDK2_PGHX4RZDPTQTJ0LwKXAUr4BYlMhYZsjcrY9eo&disp=emb&realattid=ii_l95bh7qy0|width=542,height=260!
>
> h1. Root Cause:
> I got the root cause after check:
> # AbstractFileSource will enumerateSplits when createEnumerator
> # NotSplittingRecursiveEnumerator need to get fileblockLocation of every
> fileblock which is a heavy IO operation
> !https://mail.google.com/mail/u/0?ui=2&ik=488d9ac3dd&attid=0.3&permmsgid=msg-a:r-3013789195315215531&th=183cb292e567fd9f&view=fimg&fur=ip&sz=s0-l75-ft&attbid=ANGjdJ8AoT071eCNMb_q3uJtcbrUmZnYbg3ucnDelMlRRPn7WLlXOBGj650srQk9vhqKyJEANvpOWoxHuH6jNHt7g6go8JkeRUZKc81yqT0yzzz7tbBciTe-YnRVQ7w&disp=emb&realattid=ii_l95bp1832|width=542,height=456!
>
> !https://mail.google.com/mail/u/0?ui=2&ik=488d9ac3dd&attid=0.2&permmsgid=msg-a:r-3013789195315215531&th=183cb292e567fd9f&view=fimg&fur=ip&sz=s0-l75-ft&attbid=ANGjdJ9phsX1nauTsx3xWje_YJM4uUaOLXKHcXKsm7WJquPQQGC7bQTni3OhQB5HtGYVOvrD-3Kbp9LURfUj6OiIUgsZU1AImSL0vj27cnDcf7HpVpLpaqdADtpoABU&disp=emb&realattid=ii_l95bjh1g1|width=526,height=542!
>
> h1. Suggestion
> # FileSource add option to disable location fetcher
> # Move location fetcher into IOExecutor
--
This message was sent by Atlassian Jira
(v8.20.10#820010)