[
https://issues.apache.org/jira/browse/HADOOP-3315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12629916#action_12629916
]
Doug Cutting commented on HADOOP-3315:
--------------------------------------
Looking at the specification document, I see the major stated goals are (1)
language neutrality; (2) extensibility, and (3) compatibility. I assume these
are relative to SequenceFile.
Langauge neutrality without an implementation in another language seems a risky
claim. SequenceFile's only language dependence is in the naming of key and
value classes, but implementations of these classes are not required to process
a SequenceFile. SequenceFile, like TFile, lacks implementations in other
languages, so I don't yet see a clear advantage there.
(2) and (3) are very related. SequenceFile has proven extensible and
back-compatible. Many features have been added without breaking
back-compatibility. I don't see a qualitative advantage here to the TFile
format.
Perhaps you should include a section specifically addressing the advantages of
TFile over SequenceFile, how they are achieved and how they can be measured.
I suspect there may be other unstated goals in TFile. The case for TFile
should be clearly made, as it adds a lot of code to Hadoop that must now be
supported. If it has demonstrable advantages to SequenceFile and the case can
be made that we will be able to retire SequenceFile after it is added, then
TFile should go forward. Or if it is significantly simpler than SequenceFile
while providing the same features, that might make the case that it will be
easier to reimplement in other languages. But if it is equivalently complex
and supports more-or-less the same features then it only adds baggage to the
project.
> New binary file format
> ----------------------
>
> Key: HADOOP-3315
> URL: https://issues.apache.org/jira/browse/HADOOP-3315
> Project: Hadoop Core
> Issue Type: New Feature
> Components: io
> Reporter: Owen O'Malley
> Assignee: Amir Youssefi
> Attachments: HADOOP-3315_TFILE_PREVIEW.patch,
> HADOOP-3315_TFILE_PREVIEW_WITH_LZO_TESTS.patch, TFile Specification Final.pdf
>
>
> SequenceFile's block compression format is too complex and requires 4 codecs
> to compress or decompress. It would be good to have a file format that only
> needs
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.