Zach, Perhaps I should've documented that better. That class is *not intended for real use*. As far as I know, it's never been used by anyone for anything in production. It's a demo of how one would go about writing a real SequenceFileLoader for whatever internal stuff you are using. Feel free to replace anything that makes sense for you in your implementation.
-D On Mon, Sep 27, 2010 at 1:23 PM, Zach Bailey <znbai...@gmail.com> wrote: > Hey folks, > > Not sure if this has been discussed already or if this is due to some > limitation in pig, hadoop, or java - but is there a particular reason the > PiggyBank SequenceFileLoader doesn't support the BytesWritable type for > sequence file keys/values? > > > http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/io/BytesWritable.html > > Looking at the code, it maps the pig-specific DataByteArray class to the > pig type "bytearray" - I don't understand this choice. Why use a > pig-specific class here (which is not very friendly for a mixed pig/non-pig > hadoop ecosystem)? > > In fact, if you look at the SequenceFileLoader code you will see something > that looks very strange: > > protected Object translateWritableToPigDataType(*Writable w*, byte > dataType) { > switch(dataType) { > case DataType.CHARARRAY: return ((Text) w).toString(); > * case DataType.BYTEARRAY: return((DataByteArray) w).get();* > case DataType.INTEGER: return ((IntWritable) w).get(); > case DataType.LONG: return ((LongWritable) w).get(); > case DataType.FLOAT: return ((FloatWritable) w).get(); > case DataType.DOUBLE: return ((DoubleWritable) w).get(); > case DataType.BYTE: return ((ByteWritable) w).get(); > } > > return null; > } > > This code smells - the method takes a Writeable - which makes sense, but > then for the BYTEARRAY type it's casting it to a DataByteArray, which > doesn't implement Writable! WTF, mate? > > I'm going to try my hand at switching this to use BytesWritable instead and > see what explodes. > > Cheers, > -Zach >