Yeah, this is good idea. +1
On Fri, Jun 30, 2017 at 3:26 PM, Chao Sun <sunc...@apache.org> wrote: > HMS has become the shared catalog service for multiple projects outside > Hive, > so +1 on this move (and maybe a different project name?). > > On Fri, Jun 30, 2017 at 2:10 PM, Owen O'Malley <owen.omal...@gmail.com> > wrote: > >> I'm +1 on separating out the metastore. It recognizes the reality that a >> lot of different projects use the Hive Metastore and opening up the >> community is a great move. >> >> ..Owen >> >> On Fri, Jun 30, 2017 at 1:30 PM, Xuefu Zhang <xu...@uber.com> wrote: >> >> > +1, sounds like a good idea! >> > >> > On Fri, Jun 30, 2017 at 1:24 PM, Harsha <h...@harsha.io> wrote: >> > >> > > Thanks for the proposal Alan. I am +1 on separating the Hive Metastore. >> > > This is a great opportunity for building a Metastore to not only >> address >> > > schemas for the data at rest but also for the data in motion. We have a >> > > SchemaRegistry (http://github.com/hortonworks/registry) project that >> > > allows users to register schemas for data in motion and integrates with >> > > Kafka, Kinesis, Evenhubs and other messaging queues. This will provide >> > > us with opportunity to integrate our apis with Hive Metastore and >> > > provide with one project that is truly a single metastore that can hold >> > > all schemas. >> > > >> > > Thanks, >> > > Harsha >> > > >> > > On Fri, Jun 30, 2017, at 01:18 PM, Sergio Pena wrote: >> > > > Great, thanks Alan for putting all this in the email. >> > > > +1 >> > > > >> > > > Allowing other components to continue to use the Metastore without >> the >> > > > need >> > > > to use Hive dependencies is a big plus for them. I agree with >> > everything >> > > > you mention on the email. >> > > > >> > > > - Sergio >> > > > >> > > > On Fri, Jun 30, 2017 at 1:49 PM, Julian Hyde <jh...@apache.org> >> wrote: >> > > > >> > > > > +1 >> > > > > >> > > > > As a Calcite PMC member, I am very pleased to see this change. >> > Calcite >> > > > > reads metadata from a variety of sources (including JDBC databases, >> > > NoSQL >> > > > > databases such as Cassandra and Druid, and streaming systems), and >> if >> > > more >> > > > > of those sources choose to store their metadata in the metastore it >> > > will >> > > > > make our lives easier. >> > > > > >> > > > > Hive’s metastore has established a position as the place to go for >> > > > > metadata in the Hadoop ecosystem. Not all metadata is relational, >> or >> > > > > processed by Hive, so there are other parties using the metastore >> who >> > > > > justifiably would like to influence its direction. Opening up the >> > > metastore >> > > > > will help retain and extend this position. >> > > > > >> > > > > Julian >> > > > > >> > > > > >> > > > > On 2017-06-30 10:00 (-0700), "Dimitris ts...@apache.org> wrote: >> > > > > > >> > > > > > >> > > > > > On 2017-06-30 07:56 (-0700), Alan Gates <al...@gmail.com> >> wrote: > >> > > > > > > A few of us have been talking and come to the conclussion that >> it >> > > > > would be> >> > > > > > > a good thing to split out the Hive metastore into its own >> Apache >> > > > > project.> >> > > > > > > Below and in the linked wiki page we explain what we see as the >> > > > > advantages> >> > > > > > > to this and how we would go about it.> >> > > > > > > > >> > > > > > > Hive’s metastore has long been used by other projects in the >> > > Hadoop> >> > > > > > > ecosystem to store and access metadata. Apache Impala, Apache >> > > Spark,> >> > > > > > > Apache Drill, Presto, and other systems all use Hive’s >> metastore. >> > > > > Some,> >> > > > > > > like Impala and Presto can use it as their own metadata system >> > with >> > > > > the> >> > > > > > > rest of Hive not present.> >> > > > > > > > >> > > > > > > This sharing is excellent for the ecosystem. Together with >> HDFS >> > it >> > > > > allows> >> > > > > > > users to use the tool of their choice while still accessing the >> > > same >> > > > > shared> >> > > > > > > data. But having this shared metadata inside the Hive project >> > > limits >> > > > > the> >> > > > > > > ability of other projects to contribute to the metastore. It >> > also >> > > > > makes it> >> > > > > > > harder for new systems that have similar but not identical >> > > metadata> >> > > > > > > requirements (for example, stream processing systems on top of >> > > Apache> >> > > > > > > Kafka) to use Hive’s metastore. This difficulty for other >> > systems >> > > > > comes> >> > > > > > > out in two ways. One, it is hard for non-Hive community >> members >> > > to> >> > > > > > > participate in the project. Second, it adds operational cost >> > since >> > > > > users> >> > > > > > > are forced to deploy all of the Hive jars just to get the >> > > metastore to >> > > > > work.> >> > > > > > > > >> > > > > > > Therefore we propose to split Hive’s metastore out into a >> > separate >> > > > > Apache> >> > > > > > > project. This new project will continue to support the same >> > Thrift >> > > > > API as> >> > > > > > > the current metastore. It will continue to focus on being a >> > high> >> > > > > > > performance, fault tolerant, large scale, operational metastore >> > for >> > > > > SQL> >> > > > > > > engines and other systems that want to store schema information >> > > about >> > > > > their> >> > > > > > > data.> >> > > > > > > > >> > > > > > > By making it a separate project we will enable other projects >> to >> > > join >> > > > > us in> >> > > > > > > innovating on the metastore. It will simplify operations for >> > > non-Hive> >> > > > > > > users that want to use the metastore as they will no longer >> need >> > to >> > > > > install> >> > > > > > > Hive just to get the metastore. And it will attract new >> projects >> > > that> >> > > > > > > might otherwise feel the need to solve their metadata problems >> on >> > > > > their own.> >> > > > > > > > >> > > > > > > Any Hive PMC member or committer will be welcome to join the >> new >> > > > > project at> >> > > > > > > the same level. We propose this project go straight to a top >> > > level> >> > > > > > > project. Given that the initial PMC will be formed from >> > > experienced >> > > > > Hive> >> > > > > > > PMC members we do not believe incubation will be necessary. >> > (Note >> > > > > that the> >> > > > > > > Apache board will need to approve this.)> >> > > > > > > > >> > > > > > > Obviously there a many details involved in a proposal like >> this. >> > > > > Rather> >> > > > > > > than make this a ten page email we have filled out many of the >> > > details >> > > > > in a> >> > > > > > > wiki page:> >> > > > > > > https://cwiki.apache.org/confluence/display/Hive/ >> > > > > Metastore+TLP+Proposal> >> > > > > > > > >> > > > > > > Yongzhi Chen> >> > > > > > > Vihang Karajgaonkar> >> > > > > > > Sergio Pena> >> > > > > > > Sahil Takiar> >> > > > > > > Aihua Xu> >> > > > > > > Gunther Hagleitner> >> > > > > > > Thejas Nair> >> > > > > > > Alan Gates> >> > > > > > > > >> > > > > > >> > > > > > +1 (from Apache Impala's (incubating) perspective)> >> > > > > > >> > > > > > Dimitris> >> > > > > > >> > > > > >> > > >> > > >> > > Thanks, >> > > Harsha >> > > >> > >>