[ 
https://issues.apache.org/jira/browse/HADOOP-6494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16138152#comment-16138152
 ] 

Nico Meyer commented on HADOOP-6494:
------------------------------------

I rediscovered this problem a while ago the hard way and implemented the exact 
same fix proposed here. At the very least the documentation should state, the 
multi valued keys will give the wrong result.

> MapFile.Reader does not seek to first entry for multi-valued key
> ----------------------------------------------------------------
>
>                 Key: HADOOP-6494
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6494
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: io
>            Reporter: Peter Spiro
>            Priority: Minor
>
> When a MapFile contains a key with multiple entries and one of these entries 
> other than the first happens to be stored in the index, then the Reader's 
> seek() and get*() methods will generally not return the first entry, making 
> it impossible to retrieve all of the key's entries using next().
> One easy solution would be to modify the Writer's append() method to only 
> index an entry if it's the first entry belonging to its key, e.g.:
>     public synchronized void append(WritableComparable key, Writable val)
>       throws IOException {
>       boolean equalsLastKey = (size != 0 && comparator.compare(lastKey, key) 
> == 0);
>       checkKey(key);
>       boolean largeEnoughInterval = size % indexInterval == 0;
>       if (largeEnoughInterval && !equalsLastKey) {            // add an index 
> entry
>         position.set(data.getLength());           // point to current eof
>         index.append(key, position);
>       }
>       data.append(key, val);                      // append key/value to data
>       if (!largeEnoughInterval || !equalsLastKey)
>           size++;
>     }
> (The size variable should then be renamed to something more accurate.)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to