[ 
https://issues.apache.org/jira/browse/HADOOP-3413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12597935#action_12597935
 ] 

Tom White commented on HADOOP-3413:
-----------------------------------

There is still clearly work to make the serialization framework fully supported 
in Hadoop, but I don't think the lack of support in SequenceFile.Reader is a 
blocker for 0.17.0.

The work done in HADOOP-1986 enabled serialization support in the MapReduce 
kernel, so you can use arbitrary types for keys and values. However, it is not 
yet possible to use arbitrary types for map inputs or reduce outputs out of the 
box, since the support from SequenceFile{Input|Output}Format and 
SequenceFileRecord{Reader|Writer} is still Writable-based as you point out. 
That said, it is possible to write your own InputFormat, OutputFormat, 
RecordReader, RecordWriter implementations to do this for you. For example, you 
can use SequenceFile.Writer#append(Object, Object) to write any objects to a 
sequence file (using a Serializer) and the SequenceFile.Reader#nextRaw methods 
to read bytes out to be manually deserialized using a Deserializer.

On a related note, unfortunately the RecordReader interface is incompatible 
with serialization frameworks that don't reuse objects - like Java 
Serialization. The problem is that 

{code}
boolean next(K key, V value) throws IOException
{code}

has no way of passing keys and values that are deserialized from the stream 
back to the client of the RecordReader. This is not a problem for Writables and 
Thrift since the client passes in objects that are updated in-place. To fix 
this will require some surgery on the API.


> SequenceFile.Reader doesn't use the Serialization framework
> -----------------------------------------------------------
>
>                 Key: HADOOP-3413
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3413
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: io
>    Affects Versions: 0.17.0
>            Reporter: Arun C Murthy
>            Assignee: Arun C Murthy
>            Priority: Critical
>             Fix For: 0.18.0
>
>
> Currently SequenceFile.Reader only works with Writables, since it doesn't use 
> the new Serialization framework. This is a glaring considering that 
> SequenceFile.Writer uses the Serializer and handles arbitrary types via the 
> SerializationFactory.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to