Re: [DISCUSS] Separating out the metastore as its own TLP

Edward Capriolo Sun, 02 Jul 2017 15:51:04 -0700

On Fri, Jun 30, 2017 at 2:49 PM, Julian Hyde <[email protected]> wrote:


> +1
>
> As a Calcite PMC member, I am very pleased to see this change. Calcite
> reads metadata from a variety of sources (including JDBC databases, NoSQL
> databases such as Cassandra and Druid, and streaming systems), and if more
> of those sources choose to store their metadata in the metastore it will
> make our lives easier.
>
> Hive’s metastore has established a position as the place to go for
> metadata in the Hadoop ecosystem. Not all metadata is relational, or
> processed by Hive, so there are other parties using the metastore who
> justifiably would like to influence its direction. Opening up the metastore
> will help retain and extend this position.
>
> Julian
>
>
> On 2017-06-30 10:00 (-0700), "Dimitris [email protected]> wrote:
> >
> >
> > On 2017-06-30 07:56 (-0700), Alan Gates <[email protected]> wrote: >
> > > A few of us have been talking and come to the conclussion that it
> would be>
> > > a good thing to split out the Hive metastore into its own Apache
> project.>
> > > Below and in the linked wiki page we explain what we see as the
> advantages>
> > > to this and how we would go about it.>
> > > >
> > > Hive’s metastore has long been used by other projects in the Hadoop>
> > > ecosystem to store and access metadata.  Apache Impala, Apache Spark,>
> > > Apache Drill, Presto, and other systems all use Hive’s metastore.
> Some,>
> > > like Impala and Presto can use it as their own metadata system with
> the>
> > > rest of Hive not present.>
> > > >
> > > This sharing is excellent for the ecosystem.  Together with HDFS it
> allows>
> > > users to use the tool of their choice while still accessing the same
> shared>
> > > data.  But having this shared metadata inside the Hive project limits
> the>
> > > ability of other projects to contribute to the metastore.  It also
> makes it>
> > > harder for new systems that have similar but not identical metadata>
> > > requirements (for example, stream processing systems on top of Apache>
> > > Kafka) to use Hive’s metastore.  This difficulty for other systems
> comes>
> > > out in two ways.  One, it is hard for non-Hive community members to>
> > > participate in the project.  Second, it adds operational cost since
> users>
> > > are forced to deploy all of the Hive jars just to get the metastore to
> work.>
> > > >
> > > Therefore we propose to split Hive’s metastore out into a separate
> Apache>
> > > project.  This new project will continue to support the same Thrift
> API as>
> > > the current metastore.  It will continue to focus on being a high>
> > > performance, fault tolerant, large scale, operational metastore for
> SQL>
> > > engines and other systems that want to store schema information about
> their>
> > > data.>
> > > >
> > > By making it a separate project we will enable other projects to join
> us in>
> > > innovating on the metastore.  It will simplify operations for non-Hive>
> > > users that want to use the metastore as they will no longer need to
> install>
> > > Hive just to get the metastore.  And it will attract new projects that>
> > > might otherwise feel the need to solve their metadata problems on
> their own.>
> > > >
> > > Any Hive PMC member or committer will be welcome to join the new
> project at>
> > > the same level.  We propose this project go straight to a top level>
> > > project.  Given that the initial PMC will be formed from experienced
> Hive>
> > > PMC members we do not believe incubation will be necessary.  (Note
> that the>
> > > Apache board will need to approve this.)>
> > > >
> > > Obviously there a many details involved in a proposal like this.
> Rather>
> > > than make this a ten page email we have filled out many of the details
> in a>
> > > wiki page:>
> > > https://cwiki.apache.org/confluence/display/Hive/
> Metastore+TLP+Proposal>
> > > >
> > > Yongzhi Chen>
> > > Vihang Karajgaonkar>
> > > Sergio Pena>
> > > Sahil Takiar>
> > > Aihua Xu>
> > > Gunther Hagleitner>
> > > Thejas Nair>
> > > Alan Gates>
> > > >
> >
> > +1 (from Apache Impala's (incubating) perspective)>
> >
> > Dimitris>
> >



"Hive’s metastore has established a position as the place to go for
metadata in the Hadoop ecosystem. Not all metadata is relational, or
processed by Hive, so there are other parties using the metastore who
justifiably would like to influence its direction. Opening up the metastore
will help retain and extend this position."

The metastore is open and parties can influence its direction. Meritocracy
is earned.

For example: I have seem several parties state they wish Hive metastore was
packaged such that it was easier to embed/include. However, no one has
opened a ticket and completed/started/seriously scoped out that work. I do
not see moving to a TLP and giving the code a new name will drive people to
take that next step.

I do not know how this works for TLP proposals, but I also do not think the
TLP process will "open" anything new up for you. IE I do not think the
proposal will grant anyone a free ride seat on the commiter/pmc list (I
surely would not support that)

Re: [DISCUSS] Separating out the metastore as its own TLP

Reply via email to