+1 to having a global config class. Would HCatContext.get().getConf() return the actual JobConf? Or would it return an object that HCat would promise would end up in the JobConf? If the former then it's hard to use on the front-end in Pig since early on the JobConf doesn't exist yet. If the latter you have to do a lot of game playing to make sure your info ends up in the the actual JobConf properly.
I like the idea, but I need to understand the next level of design and how this would interact with Pig's use of the JobConf that is already in place. Alan. On Aug 1, 2012, at 6:54 PM, Travis Crawford wrote: > Hey hcat gurus - > > Before Pig got full boolean support a common thing was treating them as > integers*. I'd like to provide boolean-to-int conversion in HCatalog, > enabled with a property, so the following two cases work: > > (a) Pre-boolean support pig versions can read tables with boolean columns > (b) Pig scripts written in the pre-boolean days can continue working, even > after updating pig. > > Most schema conversion stuff happens with static methods, which makes > sense, but complicates configuration. Any objection to creating a global > static class for stuff like passing configs around? This would be similar > to what Pig and Hive already have: > > UDFContext.getUDFContext().getJobConf(); > Hive.get().getConf(); > HCatContext.get().getConf(); <-- proposed new class > > We would set the conf very early on (HCatLoader, HCatInputFormat) and it > could be used to simplify configuration inside HCat. With such a class > adding this conversion would be super easy + maintainable, whereas now it > would be a very invasive change. > > Thoughts? > > --travis > > > * > https://github.com/kevinweil/elephant-bird/blob/master/pig/src/main/java/com/twitter/elephantbird/pig/util/ThriftToPig.java#L99
