[ 
https://issues.apache.org/jira/browse/HADOOP-3787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12627937#action_12627937
 ] 

Tom White commented on HADOOP-3787:
-----------------------------------

This, and HADOOP-1986 in general, does not mandate the use of SequenceFile. 
However, SequenceFiles are a convenient binary format, so that's what's I've 
used here for the example.

It would be possible to run MapReduce against Thrift records in flat files with 
a suitable InputFormat (which would need to be written), but such files would 
not be splittable (unless there is some general way to find Thrift record 
boundaries from an arbitrary position in the file). Unsplittable files do not 
in general play well with MapReduce and HDFS. Perhaps one way to fix this is to 
insert a special Thrift record every n records whose unique byte sequence can 
be scanned for to realign with the record boundaries. Could this work?

> Add serialization for Thrift
> ----------------------------
>
>                 Key: HADOOP-3787
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3787
>             Project: Hadoop Core
>          Issue Type: Wish
>          Components: examples, mapred
>            Reporter: Tom White
>         Attachments: hadoop-3787.patch, libthrift.jar
>
>
> Thrift (http://incubator.apache.org/thrift/) is cross-language serialization 
> and RPC framework. This issue is to write a ThriftSerialization to support 
> using Thrift types in MapReduce programs, including an example program. This 
> should probably go into contrib.
> (There is a prototype implementation in 
> https://issues.apache.org/jira/secure/attachment/12370464/hadoop-serializer-v2.tar.gz)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to