Re: Custom serde for parsing

Zheng Shao Thu, 10 Sep 2009 21:10:05 -0700

You can write your own specified SerDe to make it more efficient.

Basically, copy and paste RegexSerde, and:
1. use your own string scan instead of Regex Match,
2. return org.apache.hadoop.io.Text instead of java.lang.String (and reuse
the same Text for the same field in different rows)


Zheng

On Thu, Sep 10, 2009 at 9:05 PM, Mayuran Yogarajah <
[email protected]> wrote:

> Zheng Shao wrote:
>
>> 1. Yes the performance will be affected, especially we are doing one regex
>> match per row, as well as creating a lot of String objects. If we define
>> them as int and uses the default row format, we won't create those String
>> objects.
>>
>>  Is there anything I can do to alleviate this without reformatting the
> data ?
>
> thanks
>



-- 
Yours,
Zheng

Re: Custom serde for parsing

Reply via email to