HMS has become the shared catalog service for multiple projects outside Hive, so +1 on this move (and maybe a different project name?).
On Fri, Jun 30, 2017 at 2:10 PM, Owen O'Malley <owen.omal...@gmail.com> wrote: > I'm +1 on separating out the metastore. It recognizes the reality that a > lot of different projects use the Hive Metastore and opening up the > community is a great move. > > ..Owen > > On Fri, Jun 30, 2017 at 1:30 PM, Xuefu Zhang <xu...@uber.com> wrote: > > > +1, sounds like a good idea! > > > > On Fri, Jun 30, 2017 at 1:24 PM, Harsha <h...@harsha.io> wrote: > > > > > Thanks for the proposal Alan. I am +1 on separating the Hive Metastore. > > > This is a great opportunity for building a Metastore to not only > address > > > schemas for the data at rest but also for the data in motion. We have a > > > SchemaRegistry (http://github.com/hortonworks/registry) project that > > > allows users to register schemas for data in motion and integrates with > > > Kafka, Kinesis, Evenhubs and other messaging queues. This will provide > > > us with opportunity to integrate our apis with Hive Metastore and > > > provide with one project that is truly a single metastore that can hold > > > all schemas. > > > > > > Thanks, > > > Harsha > > > > > > On Fri, Jun 30, 2017, at 01:18 PM, Sergio Pena wrote: > > > > Great, thanks Alan for putting all this in the email. > > > > +1 > > > > > > > > Allowing other components to continue to use the Metastore without > the > > > > need > > > > to use Hive dependencies is a big plus for them. I agree with > > everything > > > > you mention on the email. > > > > > > > > - Sergio > > > > > > > > On Fri, Jun 30, 2017 at 1:49 PM, Julian Hyde <jh...@apache.org> > wrote: > > > > > > > > > +1 > > > > > > > > > > As a Calcite PMC member, I am very pleased to see this change. > > Calcite > > > > > reads metadata from a variety of sources (including JDBC databases, > > > NoSQL > > > > > databases such as Cassandra and Druid, and streaming systems), and > if > > > more > > > > > of those sources choose to store their metadata in the metastore it > > > will > > > > > make our lives easier. > > > > > > > > > > Hive’s metastore has established a position as the place to go for > > > > > metadata in the Hadoop ecosystem. Not all metadata is relational, > or > > > > > processed by Hive, so there are other parties using the metastore > who > > > > > justifiably would like to influence its direction. Opening up the > > > metastore > > > > > will help retain and extend this position. > > > > > > > > > > Julian > > > > > > > > > > > > > > > On 2017-06-30 10:00 (-0700), "Dimitris ts...@apache.org> wrote: > > > > > > > > > > > > > > > > > > On 2017-06-30 07:56 (-0700), Alan Gates <al...@gmail.com> > wrote: > > > > > > > > A few of us have been talking and come to the conclussion that > it > > > > > would be> > > > > > > > a good thing to split out the Hive metastore into its own > Apache > > > > > project.> > > > > > > > Below and in the linked wiki page we explain what we see as the > > > > > advantages> > > > > > > > to this and how we would go about it.> > > > > > > > > > > > > > > > Hive’s metastore has long been used by other projects in the > > > Hadoop> > > > > > > > ecosystem to store and access metadata. Apache Impala, Apache > > > Spark,> > > > > > > > Apache Drill, Presto, and other systems all use Hive’s > metastore. > > > > > Some,> > > > > > > > like Impala and Presto can use it as their own metadata system > > with > > > > > the> > > > > > > > rest of Hive not present.> > > > > > > > > > > > > > > > This sharing is excellent for the ecosystem. Together with > HDFS > > it > > > > > allows> > > > > > > > users to use the tool of their choice while still accessing the > > > same > > > > > shared> > > > > > > > data. But having this shared metadata inside the Hive project > > > limits > > > > > the> > > > > > > > ability of other projects to contribute to the metastore. It > > also > > > > > makes it> > > > > > > > harder for new systems that have similar but not identical > > > metadata> > > > > > > > requirements (for example, stream processing systems on top of > > > Apache> > > > > > > > Kafka) to use Hive’s metastore. This difficulty for other > > systems > > > > > comes> > > > > > > > out in two ways. One, it is hard for non-Hive community > members > > > to> > > > > > > > participate in the project. Second, it adds operational cost > > since > > > > > users> > > > > > > > are forced to deploy all of the Hive jars just to get the > > > metastore to > > > > > work.> > > > > > > > > > > > > > > > Therefore we propose to split Hive’s metastore out into a > > separate > > > > > Apache> > > > > > > > project. This new project will continue to support the same > > Thrift > > > > > API as> > > > > > > > the current metastore. It will continue to focus on being a > > high> > > > > > > > performance, fault tolerant, large scale, operational metastore > > for > > > > > SQL> > > > > > > > engines and other systems that want to store schema information > > > about > > > > > their> > > > > > > > data.> > > > > > > > > > > > > > > > By making it a separate project we will enable other projects > to > > > join > > > > > us in> > > > > > > > innovating on the metastore. It will simplify operations for > > > non-Hive> > > > > > > > users that want to use the metastore as they will no longer > need > > to > > > > > install> > > > > > > > Hive just to get the metastore. And it will attract new > projects > > > that> > > > > > > > might otherwise feel the need to solve their metadata problems > on > > > > > their own.> > > > > > > > > > > > > > > > Any Hive PMC member or committer will be welcome to join the > new > > > > > project at> > > > > > > > the same level. We propose this project go straight to a top > > > level> > > > > > > > project. Given that the initial PMC will be formed from > > > experienced > > > > > Hive> > > > > > > > PMC members we do not believe incubation will be necessary. > > (Note > > > > > that the> > > > > > > > Apache board will need to approve this.)> > > > > > > > > > > > > > > > Obviously there a many details involved in a proposal like > this. > > > > > Rather> > > > > > > > than make this a ten page email we have filled out many of the > > > details > > > > > in a> > > > > > > > wiki page:> > > > > > > > https://cwiki.apache.org/confluence/display/Hive/ > > > > > Metastore+TLP+Proposal> > > > > > > > > > > > > > > > Yongzhi Chen> > > > > > > > Vihang Karajgaonkar> > > > > > > > Sergio Pena> > > > > > > > Sahil Takiar> > > > > > > > Aihua Xu> > > > > > > > Gunther Hagleitner> > > > > > > > Thejas Nair> > > > > > > > Alan Gates> > > > > > > > > > > > > > > > > > > > > +1 (from Apache Impala's (incubating) perspective)> > > > > > > > > > > > > Dimitris> > > > > > > > > > > > > > > > > > > > > Thanks, > > > Harsha > > > > > >