Re: [DISCUSS] Separating out the metastore as its own TLP

Jimmy Xiang Fri, 30 Jun 2017 15:43:18 -0700

Yeah, this is good idea. +1


On Fri, Jun 30, 2017 at 3:26 PM, Chao Sun <sunc...@apache.org> wrote:
> HMS has become the shared catalog service for multiple projects outside
> Hive,
> so +1 on this move (and maybe a different project name?).
>
> On Fri, Jun 30, 2017 at 2:10 PM, Owen O'Malley <owen.omal...@gmail.com>
> wrote:
>
>> I'm +1 on separating out the metastore. It recognizes the reality that a
>> lot of different projects use the Hive Metastore and opening up the
>> community is a great move.
>>
>> ..Owen
>>
>> On Fri, Jun 30, 2017 at 1:30 PM, Xuefu Zhang <xu...@uber.com> wrote:
>>
>> > +1, sounds like a good idea!
>> >
>> > On Fri, Jun 30, 2017 at 1:24 PM, Harsha <h...@harsha.io> wrote:
>> >
>> > > Thanks for the proposal Alan. I am +1 on separating the Hive Metastore.
>> > > This is a great opportunity for building a Metastore to not only
>> address
>> > > schemas for the data at rest but also for the data in motion. We have a
>> > > SchemaRegistry (http://github.com/hortonworks/registry)  project that
>> > > allows users to register schemas for data in motion and integrates with
>> > > Kafka, Kinesis, Evenhubs and other messaging queues. This will provide
>> > > us with opportunity to integrate our apis with Hive Metastore and
>> > > provide with one project that is truly a single metastore that can hold
>> > > all schemas.
>> > >
>> > > Thanks,
>> > > Harsha
>> > >
>> > > On Fri, Jun 30, 2017, at 01:18 PM, Sergio Pena wrote:
>> > > > Great, thanks Alan for putting all this in the email.
>> > > > +1
>> > > >
>> > > > Allowing other components to continue to use the Metastore without
>> the
>> > > > need
>> > > > to use Hive dependencies is a big plus for them. I agree with
>> > everything
>> > > > you mention on the email.
>> > > >
>> > > > - Sergio
>> > > >
>> > > > On Fri, Jun 30, 2017 at 1:49 PM, Julian Hyde <jh...@apache.org>
>> wrote:
>> > > >
>> > > > > +1
>> > > > >
>> > > > > As a Calcite PMC member, I am very pleased to see this change.
>> > Calcite
>> > > > > reads metadata from a variety of sources (including JDBC databases,
>> > > NoSQL
>> > > > > databases such as Cassandra and Druid, and streaming systems), and
>> if
>> > > more
>> > > > > of those sources choose to store their metadata in the metastore it
>> > > will
>> > > > > make our lives easier.
>> > > > >
>> > > > > Hive’s metastore has established a position as the place to go for
>> > > > > metadata in the Hadoop ecosystem. Not all metadata is relational,
>> or
>> > > > > processed by Hive, so there are other parties using the metastore
>> who
>> > > > > justifiably would like to influence its direction. Opening up the
>> > > metastore
>> > > > > will help retain and extend this position.
>> > > > >
>> > > > > Julian
>> > > > >
>> > > > >
>> > > > > On 2017-06-30 10:00 (-0700), "Dimitris ts...@apache.org> wrote:
>> > > > > >
>> > > > > >
>> > > > > > On 2017-06-30 07:56 (-0700), Alan Gates <al...@gmail.com>
>> wrote: >
>> > > > > > > A few of us have been talking and come to the conclussion that
>> it
>> > > > > would be>
>> > > > > > > a good thing to split out the Hive metastore into its own
>> Apache
>> > > > > project.>
>> > > > > > > Below and in the linked wiki page we explain what we see as the
>> > > > > advantages>
>> > > > > > > to this and how we would go about it.>
>> > > > > > > >
>> > > > > > > Hive’s metastore has long been used by other projects in the
>> > > Hadoop>
>> > > > > > > ecosystem to store and access metadata.  Apache Impala, Apache
>> > > Spark,>
>> > > > > > > Apache Drill, Presto, and other systems all use Hive’s
>> metastore.
>> > > > > Some,>
>> > > > > > > like Impala and Presto can use it as their own metadata system
>> > with
>> > > > > the>
>> > > > > > > rest of Hive not present.>
>> > > > > > > >
>> > > > > > > This sharing is excellent for the ecosystem.  Together with
>> HDFS
>> > it
>> > > > > allows>
>> > > > > > > users to use the tool of their choice while still accessing the
>> > > same
>> > > > > shared>
>> > > > > > > data.  But having this shared metadata inside the Hive project
>> > > limits
>> > > > > the>
>> > > > > > > ability of other projects to contribute to the metastore.  It
>> > also
>> > > > > makes it>
>> > > > > > > harder for new systems that have similar but not identical
>> > > metadata>
>> > > > > > > requirements (for example, stream processing systems on top of
>> > > Apache>
>> > > > > > > Kafka) to use Hive’s metastore.  This difficulty for other
>> > systems
>> > > > > comes>
>> > > > > > > out in two ways.  One, it is hard for non-Hive community
>> members
>> > > to>
>> > > > > > > participate in the project.  Second, it adds operational cost
>> > since
>> > > > > users>
>> > > > > > > are forced to deploy all of the Hive jars just to get the
>> > > metastore to
>> > > > > work.>
>> > > > > > > >
>> > > > > > > Therefore we propose to split Hive’s metastore out into a
>> > separate
>> > > > > Apache>
>> > > > > > > project.  This new project will continue to support the same
>> > Thrift
>> > > > > API as>
>> > > > > > > the current metastore.  It will continue to focus on being a
>> > high>
>> > > > > > > performance, fault tolerant, large scale, operational metastore
>> > for
>> > > > > SQL>
>> > > > > > > engines and other systems that want to store schema information
>> > > about
>> > > > > their>
>> > > > > > > data.>
>> > > > > > > >
>> > > > > > > By making it a separate project we will enable other projects
>> to
>> > > join
>> > > > > us in>
>> > > > > > > innovating on the metastore.  It will simplify operations for
>> > > non-Hive>
>> > > > > > > users that want to use the metastore as they will no longer
>> need
>> > to
>> > > > > install>
>> > > > > > > Hive just to get the metastore.  And it will attract new
>> projects
>> > > that>
>> > > > > > > might otherwise feel the need to solve their metadata problems
>> on
>> > > > > their own.>
>> > > > > > > >
>> > > > > > > Any Hive PMC member or committer will be welcome to join the
>> new
>> > > > > project at>
>> > > > > > > the same level.  We propose this project go straight to a top
>> > > level>
>> > > > > > > project.  Given that the initial PMC will be formed from
>> > > experienced
>> > > > > Hive>
>> > > > > > > PMC members we do not believe incubation will be necessary.
>> > (Note
>> > > > > that the>
>> > > > > > > Apache board will need to approve this.)>
>> > > > > > > >
>> > > > > > > Obviously there a many details involved in a proposal like
>> this.
>> > > > > Rather>
>> > > > > > > than make this a ten page email we have filled out many of the
>> > > details
>> > > > > in a>
>> > > > > > > wiki page:>
>> > > > > > > https://cwiki.apache.org/confluence/display/Hive/
>> > > > > Metastore+TLP+Proposal>
>> > > > > > > >
>> > > > > > > Yongzhi Chen>
>> > > > > > > Vihang Karajgaonkar>
>> > > > > > > Sergio Pena>
>> > > > > > > Sahil Takiar>
>> > > > > > > Aihua Xu>
>> > > > > > > Gunther Hagleitner>
>> > > > > > > Thejas Nair>
>> > > > > > > Alan Gates>
>> > > > > > > >
>> > > > > >
>> > > > > > +1 (from Apache Impala's (incubating) perspective)>
>> > > > > >
>> > > > > > Dimitris>
>> > > > > >
>> > > > >
>> > >
>> > >
>> > > Thanks,
>> > > Harsha
>> > >
>> >
>>

Re: [DISCUSS] Separating out the metastore as its own TLP

Reply via email to