[
https://issues.apache.org/jira/browse/HADOOP-6494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16138152#comment-16138152
]
Nico Meyer commented on HADOOP-6494:
------------------------------------
I rediscovered this problem a while ago the hard way and implemented the exact
same fix proposed here. At the very least the documentation should state, the
multi valued keys will give the wrong result.
> MapFile.Reader does not seek to first entry for multi-valued key
> ----------------------------------------------------------------
>
> Key: HADOOP-6494
> URL: https://issues.apache.org/jira/browse/HADOOP-6494
> Project: Hadoop Common
> Issue Type: Bug
> Components: io
> Reporter: Peter Spiro
> Priority: Minor
>
> When a MapFile contains a key with multiple entries and one of these entries
> other than the first happens to be stored in the index, then the Reader's
> seek() and get*() methods will generally not return the first entry, making
> it impossible to retrieve all of the key's entries using next().
> One easy solution would be to modify the Writer's append() method to only
> index an entry if it's the first entry belonging to its key, e.g.:
> public synchronized void append(WritableComparable key, Writable val)
> throws IOException {
> boolean equalsLastKey = (size != 0 && comparator.compare(lastKey, key)
> == 0);
> checkKey(key);
> boolean largeEnoughInterval = size % indexInterval == 0;
> if (largeEnoughInterval && !equalsLastKey) { // add an index
> entry
> position.set(data.getLength()); // point to current eof
> index.append(key, position);
> }
> data.append(key, val); // append key/value to data
> if (!largeEnoughInterval || !equalsLastKey)
> size++;
> }
> (The size variable should then be renamed to something more accurate.)
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]