Hi, I have been playing with supporting loading of sequencefile and text based tables from hive using pig for last few days. I am wondering what would be the best way to proceed with this. Please share any pointers to design ideas and where to look, for developing this.
Hive stores text data with multiple delimiters for field, collection and maps. I tried using LoadFuncBasedInputDriver to support multiple delimiter text loading. For this, I am passing down these delimiters as arguments to the loadfunc. Also, the parsing code is inside my loadfunc method. I am also tied to one serde for doing this. This is not an elegant way. I am thinking of delegating this task to serde and constructing the LazyStruct out of it (I am not sure if that will still keep it generic). Any ideas how I should proceed with this? Thanks, Aniket
