BytesWriteable support in Piggybank SequenceFileLoader?

Zach Bailey Mon, 27 Sep 2010 13:30:31 -0700

Hey folks,

Not sure if this has been discussed already or if this is due to somelimitation in pig, hadoop, or java - but is there a particular reasonthe PiggyBank SequenceFileLoader doesn't support the BytesWritable typefor sequence file keys/values?


http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/io/BytesWritable.html

Looking at the code, it maps the pig-specific DataByteArray class to thepig type "bytearray" - I don't understand this choice. Why use apig-specific class here (which is not very friendly for a mixedpig/non-pig hadoop ecosystem)?

In fact, if you look at the SequenceFileLoader code you will seesomething that looks very strange:

protected Object translateWritableToPigDataType(*Writable w*, bytedataType) {

    switch(dataType) {
      case DataType.CHARARRAY: return ((Text) w).toString();
*      case DataType.BYTEARRAY: return((DataByteArray) w).get();*
      case DataType.INTEGER: return ((IntWritable) w).get();
      case DataType.LONG: return ((LongWritable) w).get();
      case DataType.FLOAT: return ((FloatWritable) w).get();
      case DataType.DOUBLE: return ((DoubleWritable) w).get();
      case DataType.BYTE: return ((ByteWritable) w).get();
    }

    return null;
  }

This code smells - the method takes a Writeable - which makes sense, butthen for the BYTEARRAY type it's casting it to a DataByteArray, whichdoesn't implement Writable! WTF, mate?

I'm going to try my hand at switching this to use BytesWritable insteadand see what explodes.


Cheers,
-Zach

BytesWriteable support in Piggybank SequenceFileLoader?

Reply via email to