Hi Venkat,

Thanks for joining the discussion.
Based on our understanding, there are still a significant number of
existing tasks using Hive. Indeed, many companies are now migrating their
data to the lakehouse, but due to historical reasons, a substantial amount
of data still resides in Hive.

Best,
Xia

Venkatakrishnan Sowrirajan <vsowr...@asu.edu> 于2024年4月25日周四 11:52写道:

> Hi Xia,
>
> +1 on introducing dynamic parallelism inference for HiveSource.
>
> Orthogonal to this discussion, curious, how commonly HiveSource is used
> these days in the industry given the popularity of table formats/sources
> like Iceberg, Hudi and Delta lake?
>
> Thanks
> Venkat
>
> On Wed, Apr 24, 2024, 7:41 PM Xia Sun <xingbe...@gmail.com> wrote:
>
> > Hi everyone,
> >
> > Thanks for all the feedback!
> >
> > If there are no more comments, I would like to start the vote thread,
> > thanks again!
> >
> > Best,
> > Xia
> >
> > Ahmed Hamdy <hamdy10...@gmail.com> 于2024年4月18日周四 21:31写道:
> >
> > > Hi Xia,
> > > I have read through the FLIP and discussion and the new version of the
> > FLIP
> > > looks better.
> > > +1 for the proposal.
> > > Best Regards
> > > Ahmed Hamdy
> > >
> > >
> > > On Thu, 18 Apr 2024 at 12:21, Ron Liu <ron9....@gmail.com> wrote:
> > >
> > > > Hi, Xia
> > > >
> > > > Thanks for updating, looks good to me.
> > > >
> > > > Best,
> > > > Ron
> > > >
> > > > Xia Sun <xingbe...@gmail.com> 于2024年4月18日周四 19:11写道:
> > > >
> > > > > Hi Ron,
> > > > > Yes, presenting it in a table might be more intuitive. I have
> already
> > > > added
> > > > > the table in the "Public Interfaces | New Config Option" chapter of
> > > FLIP.
> > > > > PTAL~
> > > > >
> > > > > Ron Liu <ron9....@gmail.com> 于2024年4月18日周四 18:10写道:
> > > > >
> > > > > > Hi, Xia
> > > > > >
> > > > > > Thanks for your reply.
> > > > > >
> > > > > > > That means, in terms
> > > > > > of priority, `table.exec.hive.infer-source-parallelism` >
> > > > > > `table.exec.hive.infer-source-parallelism.mode`.
> > > > > >
> > > > > > I still have some confusion, if the
> > > > > > `table.exec.hive.infer-source-parallelism`
> > > > > > >`table.exec.hive.infer-source-parallelism.mode`, currently
> > > > > > `table.exec.hive.infer-source-parallelism` default value is true,
> > > that
> > > > > > means always static parallelism inference work? Or perhaps after
> > this
> > > > > FLIP,
> > > > > > we changed the default behavior of
> > > > > > `table.exec.hive.infer-source-parallelism` to indicate dynamic
> > > > > parallelism
> > > > > > inference when enabled.
> > > > > > I think you should list the various behaviors of these two
> options
> > > that
> > > > > > coexist in FLIP by a table, only then users can know how the
> > dynamic
> > > > and
> > > > > > static parallelism inference work.
> > > > > >
> > > > > > Best,
> > > > > > Ron
> > > > > >
> > > > > > Xia Sun <xingbe...@gmail.com> 于2024年4月18日周四 16:33写道:
> > > > > >
> > > > > > > Hi Ron and Lijie,
> > > > > > > Thanks for joining the discussion and sharing your suggestions.
> > > > > > >
> > > > > > > > the InferMode class should also be introduced in the Public
> > > > > Interfaces
> > > > > > > > section!
> > > > > > >
> > > > > > >
> > > > > > > Thanks for the reminder, I have now added the InferMode class
> to
> > > the
> > > > > > Public
> > > > > > > Interfaces section as well.
> > > > > > >
> > > > > > > > `table.exec.hive.infer-source-parallelism.max` is 1024, I
> > checked
> > > > > > through
> > > > > > > > the code that the default value is 1000?
> > > > > > >
> > > > > > >
> > > > > > > I have checked and the default value of
> > > > > > > `table.exec.hive.infer-source-parallelism.max` is indeed 1000.
> > This
> > > > has
> > > > > > > been corrected in the FLIP.
> > > > > > >
> > > > > > > > how are`table.exec.hive.infer-source-parallelism` and
> > > > > > > > `table.exec.hive.infer-source-parallelism.mode` compatible?
> > > > > > >
> > > > > > >
> > > > > > > This is indeed a critical point. The current plan is to
> deprecate
> > > > > > > `table.exec.hive.infer-source-parallelism` but still utilize it
> > as
> > > > the
> > > > > > main
> > > > > > > switch for enabling automatic parallelism inference. That
> means,
> > in
> > > > > terms
> > > > > > > of priority, `table.exec.hive.infer-source-parallelism` >
> > > > > > > `table.exec.hive.infer-source-parallelism.mode`. In future
> > > versions,
> > > > if
> > > > > > > `table.exec.hive.infer-source-parallelism` is removed, this
> logic
> > > > will
> > > > > > also
> > > > > > > need to be revised, leaving only
> > > > > > > `table.exec.hive.infer-source-parallelism.mode` as the basis
> for
> > > > > deciding
> > > > > > > whether to enable parallelism inference. I have also added this
> > > > > > description
> > > > > > > to the FLIP.
> > > > > > >
> > > > > > >
> > > > > > > > In FLIP-367 it is supported to be able to set the Source's
> > > > > parallelism
> > > > > > > > individually, if in the future HiveSource also supports this
> > > > feature,
> > > > > > > > however, the default value of
> > > > > > > > `table.exec.hive.infer-source-parallelism.mode` is
> > > > > `InferMode.DYNAMIC`,
> > > > > > > at
> > > > > > > > this point will the parallelism be dynamically derived or
> will
> > > the
> > > > > > > manually
> > > > > > > > set parallelism take effect, and who has the higher priority?
> > > > > > >
> > > > > > >
> > > > > > > From my understanding, 'manually set parallelism' has the
> higher
> > > > > > priority,
> > > > > > > just like one of the preconditions for the effectiveness of
> > dynamic
> > > > > > > parallelism inference in the AdaptiveBatchScheduler is that the
> > > > > vertex's
> > > > > > > parallelism isn't set. I believe whether it's static inference
> or
> > > > > dynamic
> > > > > > > inference, the manually set parallelism by the user should be
> > > > > respected.
> > > > > > >
> > > > > > > > The `InferMode.NONE` option.
> > > > > > >
> > > > > > > Currently, 'adding InferMode.NONE' seems to be the prevailing
> > > > opinion.
> > > > > I
> > > > > > > will add InferMode.NONE as one of the Enum options in InferMode
> > > > class.
> > > > > > >
> > > > > > > Best,
> > > > > > > Xia
> > > > > > >
> > > > > > > Lijie Wang <wangdachui9...@gmail.com> 于2024年4月18日周四 13:50写道:
> > > > > > >
> > > > > > > > Thanks for driving the discussion.
> > > > > > > >
> > > > > > > > +1 for the proposal and +1 for the `InferMode.NONE` option.
> > > > > > > >
> > > > > > > > Best,
> > > > > > > > Lijie
> > > > > > > >
> > > > > > > > Ron liu <ron9....@gmail.com> 于2024年4月18日周四 11:36写道:
> > > > > > > >
> > > > > > > > > Hi, Xia
> > > > > > > > >
> > > > > > > > > Thanks for driving this FLIP.
> > > > > > > > >
> > > > > > > > > This proposal looks good to me overall. However, I have the
> > > > > following
> > > > > > > > minor
> > > > > > > > > questions:
> > > > > > > > >
> > > > > > > > > 1. FLIP introduced
> > > > `table.exec.hive.infer-source-parallelism.mode`
> > > > > > as a
> > > > > > > > new
> > > > > > > > > parameter, and the value is the enum class `InferMode`, I
> > think
> > > > the
> > > > > > > > > InferMode class should also be introduced in the Public
> > > > Interfaces
> > > > > > > > section!
> > > > > > > > > 2. You mentioned in FLIP that the default value of
> > > > > > > > > `table.exec.hive.infer-source-parallelism.max` is 1024, I
> > > checked
> > > > > > > through
> > > > > > > > > the code that the default value is 1000?
> > > > > > > > > 3. I also agree with Muhammet's idea that there is no need
> to
> > > > > > introduce
> > > > > > > > the
> > > > > > > > > option `table.exec.hive.infer-source-parallelism.enabled`,
> > and
> > > > that
> > > > > > > > > expanding the InferMode values will fulfill the need. There
> > is
> > > > > > another
> > > > > > > > > issue to consider here though, how are
> > > > > > > > > `table.exec.hive.infer-source-parallelism` and
> > > > > > > > > `table.exec.hive.infer-source-parallelism.mode` compatible?
> > > > > > > > > 4. In FLIP-367 it is supported to be able to set the
> Source's
> > > > > > > parallelism
> > > > > > > > > individually, if in the future HiveSource also supports
> this
> > > > > feature,
> > > > > > > > > however, the default value of
> > > > > > > > > `table.exec.hive.infer-source-parallelism.mode` is
> > `InferMode.
> > > > > > > DYNAMIC`,
> > > > > > > > at
> > > > > > > > > this point will the parallelism be dynamically derived or
> > will
> > > > the
> > > > > > > > manually
> > > > > > > > > set parallelism take effect, and who has the higher
> priority?
> > > > > > > > >
> > > > > > > > > Best,
> > > > > > > > > Ron
> > > > > > > > >
> > > > > > > > > Xia Sun <xingbe...@gmail.com> 于2024年4月17日周三 12:08写道:
> > > > > > > > >
> > > > > > > > > > Hi Jeyhun, Muhammet,
> > > > > > > > > > Thanks for all the feedback!
> > > > > > > > > >
> > > > > > > > > > > Could you please mention the default values for the new
> > > > > > > > configurations
> > > > > > > > > > >     (e.g.,
> table.exec.hive.infer-source-parallelism.mode,
> > > > > > > > > > >     table.exec.hive.infer-source-parallelism.enabled,
> > > > > > > > > > >     etc) ?
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Thanks for your suggestion. I have supplemented the
> > > explanation
> > > > > > > > regarding
> > > > > > > > > > the default values.
> > > > > > > > > >
> > > > > > > > > > > Since we are introducing the mode as a configuration
> > > option,
> > > > > > > > > > >     could it make sense to have `InferMode.NONE` option
> > > also?
> > > > > > > > > > >     The `NONE` option would disable the inference.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > This is a good idea. Looking ahead, it could eliminate
> the
> > > need
> > > > > for
> > > > > > > > > > introducing
> > > > > > > > > > a new configuration option. I haven't identified any
> > > potential
> > > > > > > > > > compatibility issues
> > > > > > > > > > as yet. If there are no further ideas from others, I'll
> go
> > > > ahead
> > > > > > and
> > > > > > > > > update
> > > > > > > > > > the FLIP to
> > > > > > > > > > introducing InferMode.NONE.
> > > > > > > > > >
> > > > > > > > > > Best,
> > > > > > > > > > Xia
> > > > > > > > > >
> > > > > > > > > > Muhammet Orazov <mor+fl...@morazow.com.invalid>
> > > 于2024年4月17日周三
> > > > > > > 10:31写道:
> > > > > > > > > >
> > > > > > > > > > > Hello Xia,
> > > > > > > > > > >
> > > > > > > > > > > Thanks for the FLIP!
> > > > > > > > > > >
> > > > > > > > > > > Since we are introducing the mode as a configuration
> > > option,
> > > > > > > > > > > could it make sense to have `InferMode.NONE` option
> also?
> > > > > > > > > > > The `NONE` option would disable the inference.
> > > > > > > > > > >
> > > > > > > > > > > This way we deprecate the
> > > > > > > `table.exec.hive.infer-source-parallelism`
> > > > > > > > > > > and no additional
> > > > > > > `table.exec.hive.infer-source-parallelism.enabled`
> > > > > > > > > > > option is required.
> > > > > > > > > > >
> > > > > > > > > > > What do you think?
> > > > > > > > > > >
> > > > > > > > > > > Best,
> > > > > > > > > > > Muhammet
> > > > > > > > > > >
> > > > > > > > > > > On 2024-04-16 07:07, Xia Sun wrote:
> > > > > > > > > > > > Hi everyone,
> > > > > > > > > > > > I would like to start a discussion on FLIP-445:
> Support
> > > > > dynamic
> > > > > > > > > > > > parallelism
> > > > > > > > > > > > inference for HiveSource[1].
> > > > > > > > > > > >
> > > > > > > > > > > > FLIP-379[2] has introduced dynamic source parallelism
> > > > > inference
> > > > > > > for
> > > > > > > > > > > > batch
> > > > > > > > > > > > jobs, which can utilize runtime information to more
> > > > > accurately
> > > > > > > > decide
> > > > > > > > > > > > the
> > > > > > > > > > > > source parallelism. As a follow-up task, we plan to
> > > > implement
> > > > > > the
> > > > > > > > > > > > dynamic
> > > > > > > > > > > > parallelism inference interface for HiveSource, and
> > also
> > > > > switch
> > > > > > > the
> > > > > > > > > > > > default
> > > > > > > > > > > > static parallelism inference to dynamic parallelism
> > > > > inference.
> > > > > > > > > > > >
> > > > > > > > > > > > Looking forward to your feedback and suggestions,
> > thanks.
> > > > > > > > > > > >
> > > > > > > > > > > > [1]
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://urldefense.com/v3/__https://cwiki.apache.org/confluence/display/FLINK/FLIP-445*3A*Support*dynamic*parallelism*inference*for*HiveSource__;JSsrKysrKw!!IKRxdwAv5BmarQ!c0CAsY9V98vWfPtmYrdWLpqB6tt3Ytz5mrImBR6NVlu5b8312EwY77qg5mHIQ8ukltEsioqHtpyfTqptLCqBHw$
> > > > > > > > > > > > [2]
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://urldefense.com/v3/__https://cwiki.apache.org/confluence/display/FLINK/FLIP-379*3A*Dynamic*source*parallelism*inference*for*batch*jobs__;JSsrKysrKys!!IKRxdwAv5BmarQ!c0CAsY9V98vWfPtmYrdWLpqB6tt3Ytz5mrImBR6NVlu5b8312EwY77qg5mHIQ8ukltEsioqHtpyfTqo1mQ_sdw$
> > > > > > > > > > > >
> > > > > > > > > > > > Best regards,
> > > > > > > > > > > > Xia
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to