You can write your own specified SerDe to make it more efficient. Basically, copy and paste RegexSerde, and: 1. use your own string scan instead of Regex Match, 2. return org.apache.hadoop.io.Text instead of java.lang.String (and reuse the same Text for the same field in different rows)
Zheng On Thu, Sep 10, 2009 at 9:05 PM, Mayuran Yogarajah < [email protected]> wrote: > Zheng Shao wrote: > >> 1. Yes the performance will be affected, especially we are doing one regex >> match per row, as well as creating a lot of String objects. If we define >> them as int and uses the default row format, we won't create those String >> objects. >> >> Is there anything I can do to alleviate this without reformatting the > data ? > > thanks > -- Yours, Zheng
