Correction: I mean hcatalog in place of hive.

On Thu, Nov 10, 2011 at 12:22 PM, Aniket Mokashi <[email protected]>wrote:

> Hi Ashutosh,
>
> Actually, data is currently in hive (Text with LazySimpleSerde) and I have
> developed a version of PigStorage (similar to PigPerformanceLoader) to load
> the text data in hcatalog(pig). To make this work with hcatalog, I am
> copying over some of the serde parameters into hcat params. Also, because
> the parsing logic is currently copied to pig, it doesnt directly support
> escapeChar, lastColumnTakeRest etc.
> Also, with this, I will have to do one loadfunc for sequencefile and they
> wont work with lazybinaryserde etc.
> I am trying to understand if there is an elegant design to solve this
> problem. I would appreciate if you can send me pointers for such an
> approach.
>
> Thanks,
> Aniket
>
> On Thu, Nov 10, 2011 at 11:26 AM, Ashutosh Chauhan 
> <[email protected]>wrote:
>
>> Hey Aniket,
>>
>> I am assuming you already have a Pig loadfunc which you want to use with
>> HCatalog, Daniel's work on
>> https://issues.apache.org/jira/browse/HCATALOG-121 has made this
>> super-easy. You can follow the testcase in that patch to see how to make
>> your custom loadfunc work with HCatalog. Essentially, top-level field
>> delimiter you can specify as a table property. Parsing of individual field
>> will be done through LoadCaster interface of loadfunc wherein you can
>> plugin your parsing logic, which can make use of delimiters within fields
>> for complex types.
>>
>> Hope it helps,
>> Ashutosh
>>
>> On Wed, Nov 9, 2011 at 16:26, Aniket Mokashi <[email protected]> wrote:
>>
>> > Hi,
>> >
>> > I have been playing with supporting loading of sequencefile and text
>> based
>> > tables from hive using pig for last few days. I am wondering what would
>> be
>> > the best way to proceed with this. Please share any pointers to design
>> > ideas and where to look, for developing this.
>> >
>> > Hive stores text data with multiple delimiters for field, collection and
>> > maps. I tried using  LoadFuncBasedInputDriver to support multiple
>> delimiter
>> > text loading. For this, I am passing down these delimiters as arguments
>> to
>> > the loadfunc. Also, the parsing code is inside my loadfunc method. I am
>> > also tied to one serde for doing this. This is not an elegant way. I am
>> > thinking of delegating this task to serde and constructing the
>> LazyStruct
>> > out of it (I am not sure if that will still keep it generic).
>> >
>> > Any ideas how I should proceed with this?
>> >
>> > Thanks,
>> > Aniket
>> >
>>
>
>
>
> --
> "...:::Aniket:::... Quetzalco@tl"
>



-- 
"...:::Aniket:::... Quetzalco@tl"

Reply via email to