Hi all,
  I am working on an M/R program to convert Zebra data to Hive RC
format. 

The TableInputFormat (Zebra) returns keys and values in the form of
BytesWritable and (Pig) Tuple.

In order to convert it to the RCFileOutputFormat whose key is
"BytesWritable and value is "BytesRefArrayWritable" I need to take in a
Pig Tuple iterate over each of its contents and convert it to
"BytesRefWritable".

The easy part is for Strings, which can be converted to BytesRefWritable
as:

myvalue = new BytesRefArrayWritable(10);
//value is a Pig Tuple and get returns a string
String s = (String)value.get(0);
myvalue.set(0, new BytesRefWritable(s.getBytes("UTF-8")));



How do I do it for java "Long", "HashMap" and "Arrays"
//value is a Pig tuple
Long l = new Long((Long)value.get(1));
myvalue.set(iter, new BytesRefWritable(l.toString().getBytes("UTF-8")));
myvalue.set(1, new BytesRefWritable(l.getBytes("UTF-8")));


HashMap<String, Object> hm = new
HashMap<String,Object>((HashMap)value.get(2));

myvalue.set(iter, new
BytesRefWritable(hm.toString().getBytes("UTF-8")));


Would the toString() method work? If I need to re-read RC format back
through the "org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe" would
it interpret correctly? 

Is there any documentation for the same?

Any suggestions would be beneficial.

Viraj

Reply via email to