Zach,
Perhaps I should've documented that better.
That class is *not intended for real use*. As far as I know, it's never been
used by anyone for anything in production.
It's a demo of how one would go about writing a real SequenceFileLoader for
whatever internal stuff you are using. Feel free to replace anything that
makes sense for you in your implementation.

-D

On Mon, Sep 27, 2010 at 1:23 PM, Zach Bailey <znbai...@gmail.com> wrote:

> Hey folks,
>
> Not sure if this has been discussed already or if this is due to some
> limitation in pig, hadoop, or java - but is there a particular reason the
> PiggyBank SequenceFileLoader doesn't support the BytesWritable type for
> sequence file keys/values?
>
>
> http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/io/BytesWritable.html
>
> Looking at the code, it maps the pig-specific DataByteArray class to the
> pig type "bytearray" - I don't understand this choice. Why use a
> pig-specific class here (which is not very friendly for a mixed pig/non-pig
> hadoop ecosystem)?
>
> In fact, if you look at the SequenceFileLoader code you will see something
> that looks very strange:
>
> protected Object translateWritableToPigDataType(*Writable w*, byte
> dataType) {
>    switch(dataType) {
>      case DataType.CHARARRAY: return ((Text) w).toString();
> *      case DataType.BYTEARRAY: return((DataByteArray) w).get();*
>      case DataType.INTEGER: return ((IntWritable) w).get();
>      case DataType.LONG: return ((LongWritable) w).get();
>      case DataType.FLOAT: return ((FloatWritable) w).get();
>      case DataType.DOUBLE: return ((DoubleWritable) w).get();
>      case DataType.BYTE: return ((ByteWritable) w).get();
>    }
>
>    return null;
>  }
>
> This code smells - the method takes a Writeable - which makes sense, but
> then for the BYTEARRAY type it's casting it to a DataByteArray, which
> doesn't implement Writable! WTF, mate?
>
> I'm going to try my hand at switching this to use BytesWritable instead and
> see what explodes.
>
> Cheers,
> -Zach
>

Reply via email to