+1 As a Calcite PMC member, I am very pleased to see this change. Calcite reads metadata from a variety of sources (including JDBC databases, NoSQL databases such as Cassandra and Druid, and streaming systems), and if more of those sources choose to store their metadata in the metastore it will make our lives easier.
Hive’s metastore has established a position as the place to go for metadata in the Hadoop ecosystem. Not all metadata is relational, or processed by Hive, so there are other parties using the metastore who justifiably would like to influence its direction. Opening up the metastore will help retain and extend this position. Julian On 2017-06-30 10:00 (-0700), "Dimitris ts...@apache.org> wrote: > > > On 2017-06-30 07:56 (-0700), Alan Gates <al...@gmail.com> wrote: > > > A few of us have been talking and come to the conclussion that it would be> > > a good thing to split out the Hive metastore into its own Apache project.> > > Below and in the linked wiki page we explain what we see as the advantages> > > to this and how we would go about it.> > > > > > Hive’s metastore has long been used by other projects in the Hadoop> > > ecosystem to store and access metadata. Apache Impala, Apache Spark,> > > Apache Drill, Presto, and other systems all use Hive’s metastore. Some,> > > like Impala and Presto can use it as their own metadata system with the> > > rest of Hive not present.> > > > > > This sharing is excellent for the ecosystem. Together with HDFS it allows> > > users to use the tool of their choice while still accessing the same > > shared> > > data. But having this shared metadata inside the Hive project limits the> > > ability of other projects to contribute to the metastore. It also makes > > it> > > harder for new systems that have similar but not identical metadata> > > requirements (for example, stream processing systems on top of Apache> > > Kafka) to use Hive’s metastore. This difficulty for other systems comes> > > out in two ways. One, it is hard for non-Hive community members to> > > participate in the project. Second, it adds operational cost since users> > > are forced to deploy all of the Hive jars just to get the metastore to > > work.> > > > > > Therefore we propose to split Hive’s metastore out into a separate Apache> > > project. This new project will continue to support the same Thrift API as> > > the current metastore. It will continue to focus on being a high> > > performance, fault tolerant, large scale, operational metastore for SQL> > > engines and other systems that want to store schema information about > > their> > > data.> > > > > > By making it a separate project we will enable other projects to join us > > in> > > innovating on the metastore. It will simplify operations for non-Hive> > > users that want to use the metastore as they will no longer need to > > install> > > Hive just to get the metastore. And it will attract new projects that> > > might otherwise feel the need to solve their metadata problems on their > > own.> > > > > > Any Hive PMC member or committer will be welcome to join the new project > > at> > > the same level. We propose this project go straight to a top level> > > project. Given that the initial PMC will be formed from experienced Hive> > > PMC members we do not believe incubation will be necessary. (Note that > > the> > > Apache board will need to approve this.)> > > > > > Obviously there a many details involved in a proposal like this. Rather> > > than make this a ten page email we have filled out many of the details in > > a> > > wiki page:> > > https://cwiki.apache.org/confluence/display/Hive/Metastore+TLP+Proposal> > > > > > Yongzhi Chen> > > Vihang Karajgaonkar> > > Sergio Pena> > > Sahil Takiar> > > Aihua Xu> > > Gunther Hagleitner> > > Thejas Nair> > > Alan Gates> > > > > > +1 (from Apache Impala's (incubating) perspective)> > > Dimitris> >