MapFile.Reader does not seek to first entry for multi-valued key
----------------------------------------------------------------

                 Key: HADOOP-6494
                 URL: https://issues.apache.org/jira/browse/HADOOP-6494
             Project: Hadoop Common
          Issue Type: Bug
          Components: io
            Reporter: Peter Spiro
            Priority: Minor


When a MapFile contains a key with multiple entries and one of these entries 
other than the first happens to be stored in the index, then the Reader's 
seek() and get*() methods will generally not return the first entry, making it 
impossible to retrieve all of the key's entries using next().

One easy solution would be to modify the Writer's append() method to only index 
an entry if it's the first entry belonging to its key, e.g.:


    public synchronized void append(WritableComparable key, Writable val)
      throws IOException {

      boolean equalsLastKey = (size != 0 && comparator.compare(lastKey, key) == 
0);
      checkKey(key);

      boolean largeEnoughInterval = size % indexInterval == 0;
      if (largeEnoughInterval && !equalsLastKey) {            // add an index 
entry
        position.set(data.getLength());           // point to current eof
        index.append(key, position);
      }

      data.append(key, val);                      // append key/value to data
      if (!largeEnoughInterval || !equalsLastKey)
          size++;
    }


(The size variable should then be renamed to something more accurate.)




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to