Hi Viraj I recommend you to use Hive's columnserde/lazyserde's code to serialize and deserialize the data. This can help you avoid write your own way to serialze/deserialize the data.
Basically, for primitives, it is easy to serialize and de-serialize. But for complex types, you need to use separators. Thanks Yongqiang On 6/8/10 10:50 AM, "Viraj Bhat" <[email protected]> wrote: > Hi all, > I am working on an M/R program to convert Zebra data to Hive RC > format. > > The TableInputFormat (Zebra) returns keys and values in the form of > BytesWritable and (Pig) Tuple. > > In order to convert it to the RCFileOutputFormat whose key is > "BytesWritable and value is "BytesRefArrayWritable" I need to take in a > Pig Tuple iterate over each of its contents and convert it to > "BytesRefWritable". > > The easy part is for Strings, which can be converted to BytesRefWritable > as: > > myvalue = new BytesRefArrayWritable(10); > //value is a Pig Tuple and get returns a string > String s = (String)value.get(0); > myvalue.set(0, new BytesRefWritable(s.getBytes("UTF-8"))); > > > > How do I do it for java "Long", "HashMap" and "Arrays" > //value is a Pig tuple > Long l = new Long((Long)value.get(1)); > myvalue.set(iter, new BytesRefWritable(l.toString().getBytes("UTF-8"))); > myvalue.set(1, new BytesRefWritable(l.getBytes("UTF-8"))); > > > HashMap<String, Object> hm = new > HashMap<String,Object>((HashMap)value.get(2)); > > myvalue.set(iter, new > BytesRefWritable(hm.toString().getBytes("UTF-8"))); > > > Would the toString() method work? If I need to re-read RC format back > through the "org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe" would > it interpret correctly? > > Is there any documentation for the same? > > Any suggestions would be beneficial. > > Viraj
