Hey Aniket,

I am assuming you already have a Pig loadfunc which you want to use with
HCatalog, Daniel's work on
https://issues.apache.org/jira/browse/HCATALOG-121 has made this
super-easy. You can follow the testcase in that patch to see how to make
your custom loadfunc work with HCatalog. Essentially, top-level field
delimiter you can specify as a table property. Parsing of individual field
will be done through LoadCaster interface of loadfunc wherein you can
plugin your parsing logic, which can make use of delimiters within fields
for complex types.

Hope it helps,
Ashutosh

On Wed, Nov 9, 2011 at 16:26, Aniket Mokashi <[email protected]> wrote:

> Hi,
>
> I have been playing with supporting loading of sequencefile and text based
> tables from hive using pig for last few days. I am wondering what would be
> the best way to proceed with this. Please share any pointers to design
> ideas and where to look, for developing this.
>
> Hive stores text data with multiple delimiters for field, collection and
> maps. I tried using  LoadFuncBasedInputDriver to support multiple delimiter
> text loading. For this, I am passing down these delimiters as arguments to
> the loadfunc. Also, the parsing code is inside my loadfunc method. I am
> also tied to one serde for doing this. This is not an elegant way. I am
> thinking of delegating this task to serde and constructing the LazyStruct
> out of it (I am not sure if that will still keep it generic).
>
> Any ideas how I should proceed with this?
>
> Thanks,
> Aniket
>

Reply via email to