Hi Ashutosh, Actually, data is currently in hive (Text with LazySimpleSerde) and I have developed a version of PigStorage (similar to PigPerformanceLoader) to load the text data in hive. To make this work with hive, I am copying over some of the serde parameters into hcat params. Also, because the parsing logic is currently copied to pig, it doesnt directly support escapeChar, lastColumnTakeRest etc. Also, with this, I will have to do one loadfunc for sequencefile and they wont work with lazybinaryserde etc. I am trying to understand if there is an elegant design to solve this problem. I would appreciate if you can send me pointers for such an approach.
Thanks, Aniket On Thu, Nov 10, 2011 at 11:26 AM, Ashutosh Chauhan <[email protected]>wrote: > Hey Aniket, > > I am assuming you already have a Pig loadfunc which you want to use with > HCatalog, Daniel's work on > https://issues.apache.org/jira/browse/HCATALOG-121 has made this > super-easy. You can follow the testcase in that patch to see how to make > your custom loadfunc work with HCatalog. Essentially, top-level field > delimiter you can specify as a table property. Parsing of individual field > will be done through LoadCaster interface of loadfunc wherein you can > plugin your parsing logic, which can make use of delimiters within fields > for complex types. > > Hope it helps, > Ashutosh > > On Wed, Nov 9, 2011 at 16:26, Aniket Mokashi <[email protected]> wrote: > > > Hi, > > > > I have been playing with supporting loading of sequencefile and text > based > > tables from hive using pig for last few days. I am wondering what would > be > > the best way to proceed with this. Please share any pointers to design > > ideas and where to look, for developing this. > > > > Hive stores text data with multiple delimiters for field, collection and > > maps. I tried using LoadFuncBasedInputDriver to support multiple > delimiter > > text loading. For this, I am passing down these delimiters as arguments > to > > the loadfunc. Also, the parsing code is inside my loadfunc method. I am > > also tied to one serde for doing this. This is not an elegant way. I am > > thinking of delegating this task to serde and constructing the LazyStruct > > out of it (I am not sure if that will still keep it generic). > > > > Any ideas how I should proceed with this? > > > > Thanks, > > Aniket > > > -- "...:::Aniket:::... Quetzalco@tl"
