Hi Vionth, Bhavani,
+1 for renaming hudi-hive -> hudi-hive-sync
About "hudi-hadoopm-mr -> hudi-hive", I suggest that we use the
"hudi-{another bigdata framework}" naming pattern more carefully. On a
superficial level of understanding. It is very easy for users to
misunderstand that the module is doing ecosystem integration. Especially
those who have seen the source code of mainstream projects, such as
presto.[1]
When we go to check out the hardi-hadoop-mr, it actually just contains some
InputFormat.
If we do want to mention other frameworks without letting users
misunderstand that we are doing ecosystem integration. Then, we need to add
additional information, for example: "hudi- {another bigdata framework}
-xxx" or "hudi-xxx- {another bigdata framework}".
[1]: https://github.com/prestodb/presto
Best,
Vino
Bhavani Sudha <[email protected]> 于2020年1月17日周五 上午5:42写道:
> Thanks @vinoth for giving a overall picture. I think I can relate better
> with the name changes you proposed.
>
> +1 for renaming hudi-hive -> hudi-hive-sync and hudi-hadoopm-mr ->
> hudi-hive
>
> On Thu, Jan 16, 2020 at 1:33 PM Vinoth Chandar <[email protected]> wrote:
>
> > First let me share the context for the existing name.. We saw how Parquet
> > hands out the InputFormat and named it similar to parquet-mr.
> > InputFormat is indeed a MapReduce class.. I know we live in the age of
> > Flink and Spark.. But its true :)
> >
> > I think this is the crux of the "understandability" issue..
> >
> > Here are my thoughts..
> >
> > - +0 (neutral) on the rename to hudi-query-common., (whatever we decide,
> > we need to rename the bundle accordingly)
> > - On hudi-query-bundle being confusing with hive/spark/presto bundles, I
> > don't feel its more confusing than it is today
> >
> > Real issue IMO, is hudi-hive, which is really about syncing to hive, not
> > querying Hive.
> > Then, may be we can rename
> > - hudi-hadoop-mr to hudi-hive (more understandable, Hive does use
> > InputFormat as the abstraction)
> > - current hudi-hive to hudi-hive-sync
> > (bundles renamed accordingly)
> >
> > I know this hijacks the conversation. Apologize :). But thought I'd
> present
> > a broader take
> >
> >
> >
> > On Thu, Jan 16, 2020 at 11:26 AM Bhavani Sudha Saktheeswaran
> > <[email protected]> wrote:
> >
> > > +1 to generally renaming the packages. Since this is about renaming for
> > the
> > > purpose of making it user friendly, I am concerned if we make this as
> > > hudi-query-bundle, users might get confused with other modules like
> > > hudi-hive and hudi-spark. And inside packaging module, we further have
> > > bundles specific to spark, hive and presto.
> > >
> > > Any suggestions on how to rename broadly to avoid these confusions? Let
> > me
> > > also think and get back.
> > >
> > > Thanks,
> > > Sudha
> > >
> > > On Wed, Jan 15, 2020 at 9:56 PM vino yang <[email protected]>
> wrote:
> > >
> > > > Hi guys,
> > > >
> > > > I want to start a proposal about refactoring the naming of the
> > > > "hudi-hadoop-mr" module.
> > > >
> > > > IMHO, this module name is not user-friendly. It may make users
> > confused.
> > > > Because it looks like that it's about integrating with MapReduce(
> > > although
> > > > I know it referenced parquet-mr[1] project).
> > > >
> > > > Based on the purpose of this module (contains InputFormat
> > implementations
> > > > for ReadOptimized, Incremental, Realtime views).
> > > >
> > > > I suggest that we can rename it to "*hudi-query-common*". Then, we
> can
> > > also
> > > > rename "hudi-hadoop-mr-bundle" to "*hudi-query-bundle*".
> > > >
> > > > What do you think?
> > > >
> > > > Any thoughts and suggestions are welcome and appreciated.
> > > >
> > > > Best,
> > > > Vino
> > > >
> > > > [1]:
> > > >
> > >
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_parquet-2Dmr&d=DwIBaQ&c=r2dcLCtU9q6n0vrtnDw9vg&r=oyPDRKU5b-LuEWWyf8gacx4mFFydIGdyS50OKdxizX0&m=dmZJgDEuo5sZCNsoyMRQUpiJoBP7u4r2i8cdHDMmQic&s=4CnBhu54QxDqAWdCb3NXUdQg9beV2xEmgx-N0yhTr9Y&e=
> > > >
> > >
> >
>