To make the discussion a bit more concrete I put together a patch to
illustrate things:

    https://issues.apache.org/jira/browse/HCATALOG-460

Thoughts? It only handles the pig read case while illustrating the
idea, but if this looks like the right direction I can continue down
this path.

--travis



On Thu, Aug 2, 2012 at 10:07 AM, Travis Crawford
<[email protected]> wrote:
> For Pig, my initial thought is using the actual job conf provided to
> methods like:
>
>     LoadFunc.setLocation(String location, Job job)
>     LoadMetadata.getSchema(String location, Job job)
>     StoreFunc.setStoreLocation(String location, Job job)
>
> since they are passed a configuration and called before any HCat code.
> For MR I'm less sure specifically how this would work but could look
> into it.
>
> Can y'all think of any cases in Pig where HCatContext would be needed
> but not yet initialized, if it were initialized in the above methods?
>
> If this sounds like a worthwhile path to explore I can put together a
> proof-of-concept patch.
>
> --travis
>
>
>
> On Thu, Aug 2, 2012 at 7:55 AM, Alan Gates <[email protected]> wrote:
>>
>> +1 to having a global config class.  Would HCatContext.get().getConf() 
>> return the actual JobConf?  Or would it return an object that HCat would 
>> promise would end up in the JobConf?  If the former then it's hard to use on 
>> the front-end in Pig since early on the JobConf doesn't exist yet.  If the 
>> latter you have to do a lot of game playing to make sure your info ends up 
>> in the the actual JobConf properly.
>>
>> I like the idea, but I need to understand the next level of design and how 
>> this would interact with Pig's use of the JobConf that is already in place.
>>
>> Alan.
>>
>> On Aug 1, 2012, at 6:54 PM, Travis Crawford wrote:
>>
>> > Hey hcat gurus -
>> >
>> > Before Pig got full boolean support a common thing was treating them as
>> > integers*. I'd like to provide boolean-to-int conversion in HCatalog,
>> > enabled with a property, so the following two cases work:
>> >
>> > (a) Pre-boolean support pig versions can read tables with boolean columns
>> > (b) Pig scripts written in the pre-boolean days can continue working, even
>> > after updating pig.
>> >
>> > Most schema conversion stuff happens with static methods, which makes
>> > sense, but complicates configuration. Any objection to creating a global
>> > static class for stuff like passing configs around? This would be similar
>> > to what Pig and Hive already have:
>> >
>> >    UDFContext.getUDFContext().getJobConf();
>> >    Hive.get().getConf();
>> >    HCatContext.get().getConf();  <-- proposed new class
>> >
>> > We would set the conf very early on (HCatLoader, HCatInputFormat) and it
>> > could be used to simplify configuration inside HCat. With such a class
>> > adding this conversion would be super easy + maintainable, whereas now it
>> > would be a very invasive change.
>> >
>> > Thoughts?
>> >
>> > --travis
>> >
>> >
>> > *
>> > https://github.com/kevinweil/elephant-bird/blob/master/pig/src/main/java/com/twitter/elephantbird/pig/util/ThriftToPig.java#L99
>>

Reply via email to