[
https://issues.apache.org/jira/browse/HADOOP-3315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12596169#action_12596169
]
stack commented on HADOOP-3315:
-------------------------------
This proposal looks great. Here's a couple of comments:
In 'Goals', says 'support all kinds of columns'? Do you mean all column data
types? Also says 'support seek to key and seek to row'. What is the
difference between a key and a row?
In the description of the blockidx, says 'Spare index of keys into the
datablocks'. Whats this mean? The key that is at the start of each block will
be in the block index? And only this? Or will index have entries keys from
the middle of blocks in it?
Does the metadata value have to be a String? It looks like it doesn't have to
be -- that I can specify my own keyClass and valClass. For example, I would
like to be able to write a bloom filter into the metadata.
Its not plain that user can add their own metadata to imeta. You might
explicitly state this.
Section 3.2 where you describe two different kinds of index is a little
confusing (I'm not clear on RO vs. Key as per above).
In the Writer API, you state that a null key class is for a keyless column.
Whats a null value class imply?
Is the Writer API missing metadata writing? Same for reading.
Reading talks about rowids but writer does not. Is this intentional?
For the reader API, expose methods getting key only without reading value?
> New binary file format
> ----------------------
>
> Key: HADOOP-3315
> URL: https://issues.apache.org/jira/browse/HADOOP-3315
> Project: Hadoop Core
> Issue Type: New Feature
> Components: io
> Reporter: Owen O'Malley
> Assignee: Srikanth Kakani
> Attachments: Tfile-1.pdf
>
>
> SequenceFile's block compression format is too complex and requires 4 codecs
> to compress or decompress. It would be good to have a file format that only
> needs
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.