Hi.

Do you know anyone using custom serializer/deserializer in pig streaming?

I was looking at http://wiki.apache.org/pig/PigStreamingFunctionalSpec and was 
impressed on various features it supports.
Then, looking at the code, I was sad to see many additional data copying done 
to support those features when simple case should be one copy out to stdin and 
another copy in from stdout.

So far, this is my understanding.  2 extra copying on the sender side and 3 
extra copying on the receiver side.

Assuming Default(Input/Output)Handler + PigStreaming, then

PigInputHandler.putNext(Tuple t)
--> serializer.serialize(t)
-->--> COPY to out(ByteArrayOutputStream)
-->--> COPY by out.toByteArray()
--> write to stdin (copy but necessary)

Streaming

--> OutputHandler.getNext()
-->--> Text value = readLine(stdin)   (copy but necessary)
-->--> System.arraycopy(value.getBytes(), 0, newBytes, 0, value.getLength());   
COPY just because deserialize require exact size byte array?
-->-->deserializer.deserialize(byte [])
-->-->-->  Text val = new Text(bytes); COPY since Text somehow does not want to 
reuse the byte array
-->-->-->  StorageUtil.textToTuple(val, fieldDel)
-->-->-->--> Create ArrayList of DataByteArrays    COPY.

Now wondering if we can simplify it somehow.

Thanks,
Koji

Reply via email to