Hi. Do you know anyone using custom serializer/deserializer in pig streaming?
I was looking at http://wiki.apache.org/pig/PigStreamingFunctionalSpec and was impressed on various features it supports. Then, looking at the code, I was sad to see many additional data copying done to support those features when simple case should be one copy out to stdin and another copy in from stdout. So far, this is my understanding. 2 extra copying on the sender side and 3 extra copying on the receiver side. Assuming Default(Input/Output)Handler + PigStreaming, then PigInputHandler.putNext(Tuple t) --> serializer.serialize(t) -->--> COPY to out(ByteArrayOutputStream) -->--> COPY by out.toByteArray() --> write to stdin (copy but necessary) Streaming --> OutputHandler.getNext() -->--> Text value = readLine(stdin) (copy but necessary) -->--> System.arraycopy(value.getBytes(), 0, newBytes, 0, value.getLength()); COPY just because deserialize require exact size byte array? -->-->deserializer.deserialize(byte []) -->-->--> Text val = new Text(bytes); COPY since Text somehow does not want to reuse the byte array -->-->--> StorageUtil.textToTuple(val, fieldDel) -->-->-->--> Create ArrayList of DataByteArrays COPY. Now wondering if we can simplify it somehow. Thanks, Koji
