Re: [DISCUSSION]: Synchronizing HCatalog and Hive trees

Alan Gates Fri, 18 Nov 2011 09:29:46 -0800

There are differing opinions on whether the best route forward is for the 
metastore to move to HCatalog and Hive depend on it or Hive continue to own the 
metastore code and HCatalog wrap it (as it does today).  Even if we did agree 
to move the metastore to HCatalog the Hive community would want to see more 
maturity in the HCatalog project before doing so, which is totally 
understandable.


So, I am fine for the moment in leaving things as they are, and revisiting the 
issue later as HCatalog matures.

Alan.

On Nov 14, 2011, at 2:32 PM, <[email protected]> 
<[email protected]> wrote:

> Based on the typical drawing of a Hadoop stack, where Hcatalog sits just
> above HDFS and Hbase, and below Pig, Hive, and MapReduce, my understanding
> was that SerDes and Storage Handlers *should* belong to Hcatalog, whereas
> Hive's CLI should make use of Hcatalog API.
> 
> Is that understanding correct ?
> 
> If yes, are there any discussions happening on this refactoring ?
> 
> - Milind
> 
> 
> On 11/14/11 1:39 PM, "Carl Steinbach" <[email protected]> wrote:
> 
>> HCatalog also depends on Hive's CLI, its parser/query compiler, and
>> its collection of SerDes and StorageHandlers, so HCatalog will still
>> have Hive dependencies even if the metastore is moved over to HCat.
>> 
>> On Mon, Nov 14, 2011 at 4:20 PM, <[email protected]> wrote:
>> 
>>> Any roadmaps/timelines/discussions on moving the Hive meta store to
>>> Hcatalog, so that the dependencies are reversed, as they should be ?
>>> 
>>> - milind
>>> 
>>> ---
>>> Milind Bhandarkar
>>> Greenplum Labs, EMC
>>> (Disclaimer: Opinions expressed in this email are those of the author,
>>> and
>>> do not necessarily represent the views of any organization, past or
>>> present, the author might be affiliated with.)
>>> 
>>> 
>>> 
>>> On 11/10/11 5:24 PM, "Olga Natkovich" <[email protected]> wrote:
>>> 
>>>> Hi Alan,
>>>> 
>>>> Thanks for your feedback.
>>>> 
>>>> Yes, I agree that we should prefer released code but for that Hive
>>> needs
>>>> to have a pretty frequent release schedule and we have not see that so
>>>> far.
>>>> 
>>>> Hopefully, it would be latest Hive code by default but if that is
>>>> problematic then we could use whatever code meets the requirements.
>>>> 
>>>> In step 3, we don't need to wait till we branch - we could do it as the
>>>> project goes. I am just saying we need to make sure that when we branch
>>>> we make a call of which version/revision of the code to use with the
>>>> release.
>>>> 
>>>> Olga
>>>> 
>>>> -----Original Message-----
>>>> From: Alan Gates [mailto:[email protected]]
>>>> Sent: Thursday, November 10, 2011 8:51 AM
>>>> To: [email protected]
>>>> Subject: Re: [DISCUSSION]: Synchronizing HCatalog and Hive trees
>>>> 
>>>> Mostly agree, a few comments inline.
>>>> 
>>>> Alan.
>>>> 
>>>> On Nov 9, 2011, at 2:54 PM, Olga Natkovich wrote:
>>>> 
>>>>> Hi,
>>>>> 
>>>>> Since HCatalog has dependencies on Hive source tree we need to figure
>>>>> out how to stay in synch with Hive source while not having to deal
>>> with
>>>>> random build/test failures on a regular basis. Here is the proposal:
>>>>> 
>>>>> 
>>>>> (1)    During normal development cycle, HCatalog trunk would use a
>>>>> particular revision of Hive to build against
>>>>> 
>>>>> (2)    Any time a change from Hive is needed by Hcatalog, the
>>> revision
>>>>> number will move forward. The developer who is brining this change
>>> into
>>>>> Hcatalog is responsible for making sure that the build is stable
>>> before
>>>>> moving the extern tag
>>>>> 
>>>>> (3)    As part of the HCatalog release process, prior to branching
>>> for
>>>>> the release, HCatalog will be integrated with the latest Hive code.
>>>>> 
>>>>> a.       This could be the latest Hive release if it contains all the
>>>>> changes required for Hcatalog or the latest Hive trunk otherwise
>>>> 
>>>> s/could/should  We should strongly prefer using released versions where
>>>> possible.
>>>>> 
>>>>> b.      Developer responsible for branching for the release is
>>>>> responsible for stabilizing the build with the latest Hive code prior
>>> to
>>>>> branching. Once the stabilization is done, a tag is created in Hive
>>> and
>>>>> the release branch uses that tag for all builds
>>>> 
>>>> s/latest Hive code/chosen Hive code  I assume that's what you meant
>>>>> 
>>>>> c.       If later on a problem is found with this tag, Hive code
>>> would
>>>>> be branched on the tag and necessary bug fixes applied.
>>>>> 
>>>>> Comments?
>>>> 
>>>> It's not clear to me why step 3 needs to wait for a release cycle.
>>> That
>>>> seems like a bound, but not something that we have to wait for.
>>>> 
>>>>> 
>>>>> Olga
>>>> 
>>>> 
>>> 
>>> 
>

Re: [DISCUSSION]: Synchronizing HCatalog and Hive trees

Reply via email to