Hey Aniket, I am assuming you already have a Pig loadfunc which you want to use with HCatalog, Daniel's work on https://issues.apache.org/jira/browse/HCATALOG-121 has made this super-easy. You can follow the testcase in that patch to see how to make your custom loadfunc work with HCatalog. Essentially, top-level field delimiter you can specify as a table property. Parsing of individual field will be done through LoadCaster interface of loadfunc wherein you can plugin your parsing logic, which can make use of delimiters within fields for complex types.
Hope it helps, Ashutosh On Wed, Nov 9, 2011 at 16:26, Aniket Mokashi <[email protected]> wrote: > Hi, > > I have been playing with supporting loading of sequencefile and text based > tables from hive using pig for last few days. I am wondering what would be > the best way to proceed with this. Please share any pointers to design > ideas and where to look, for developing this. > > Hive stores text data with multiple delimiters for field, collection and > maps. I tried using LoadFuncBasedInputDriver to support multiple delimiter > text loading. For this, I am passing down these delimiters as arguments to > the loadfunc. Also, the parsing code is inside my loadfunc method. I am > also tied to one serde for doing this. This is not an elegant way. I am > thinking of delegating this task to serde and constructing the LazyStruct > out of it (I am not sure if that will still keep it generic). > > Any ideas how I should proceed with this? > > Thanks, > Aniket >
