Hi,

I have been playing with supporting loading of sequencefile and text based
tables from hive using pig for last few days. I am wondering what would be
the best way to proceed with this. Please share any pointers to design
ideas and where to look, for developing this.

Hive stores text data with multiple delimiters for field, collection and
maps. I tried using  LoadFuncBasedInputDriver to support multiple delimiter
text loading. For this, I am passing down these delimiters as arguments to
the loadfunc. Also, the parsing code is inside my loadfunc method. I am
also tied to one serde for doing this. This is not an elegant way. I am
thinking of delegating this task to serde and constructing the LazyStruct
out of it (I am not sure if that will still keep it generic).

Any ideas how I should proceed with this?

Thanks,
Aniket

Reply via email to