Correction: I mean hcatalog in place of hive. On Thu, Nov 10, 2011 at 12:22 PM, Aniket Mokashi <[email protected]>wrote:
> Hi Ashutosh, > > Actually, data is currently in hive (Text with LazySimpleSerde) and I have > developed a version of PigStorage (similar to PigPerformanceLoader) to load > the text data in hcatalog(pig). To make this work with hcatalog, I am > copying over some of the serde parameters into hcat params. Also, because > the parsing logic is currently copied to pig, it doesnt directly support > escapeChar, lastColumnTakeRest etc. > Also, with this, I will have to do one loadfunc for sequencefile and they > wont work with lazybinaryserde etc. > I am trying to understand if there is an elegant design to solve this > problem. I would appreciate if you can send me pointers for such an > approach. > > Thanks, > Aniket > > On Thu, Nov 10, 2011 at 11:26 AM, Ashutosh Chauhan > <[email protected]>wrote: > >> Hey Aniket, >> >> I am assuming you already have a Pig loadfunc which you want to use with >> HCatalog, Daniel's work on >> https://issues.apache.org/jira/browse/HCATALOG-121 has made this >> super-easy. You can follow the testcase in that patch to see how to make >> your custom loadfunc work with HCatalog. Essentially, top-level field >> delimiter you can specify as a table property. Parsing of individual field >> will be done through LoadCaster interface of loadfunc wherein you can >> plugin your parsing logic, which can make use of delimiters within fields >> for complex types. >> >> Hope it helps, >> Ashutosh >> >> On Wed, Nov 9, 2011 at 16:26, Aniket Mokashi <[email protected]> wrote: >> >> > Hi, >> > >> > I have been playing with supporting loading of sequencefile and text >> based >> > tables from hive using pig for last few days. I am wondering what would >> be >> > the best way to proceed with this. Please share any pointers to design >> > ideas and where to look, for developing this. >> > >> > Hive stores text data with multiple delimiters for field, collection and >> > maps. I tried using LoadFuncBasedInputDriver to support multiple >> delimiter >> > text loading. For this, I am passing down these delimiters as arguments >> to >> > the loadfunc. Also, the parsing code is inside my loadfunc method. I am >> > also tied to one serde for doing this. This is not an elegant way. I am >> > thinking of delegating this task to serde and constructing the >> LazyStruct >> > out of it (I am not sure if that will still keep it generic). >> > >> > Any ideas how I should proceed with this? >> > >> > Thanks, >> > Aniket >> > >> > > > > -- > "...:::Aniket:::... Quetzalco@tl" > -- "...:::Aniket:::... Quetzalco@tl"
