MapFile.Reader does not seek to first entry for multi-valued key ----------------------------------------------------------------
Key: HADOOP-6494 URL: https://issues.apache.org/jira/browse/HADOOP-6494 Project: Hadoop Common Issue Type: Bug Components: io Reporter: Peter Spiro Priority: Minor When a MapFile contains a key with multiple entries and one of these entries other than the first happens to be stored in the index, then the Reader's seek() and get*() methods will generally not return the first entry, making it impossible to retrieve all of the key's entries using next(). One easy solution would be to modify the Writer's append() method to only index an entry if it's the first entry belonging to its key, e.g.: public synchronized void append(WritableComparable key, Writable val) throws IOException { boolean equalsLastKey = (size != 0 && comparator.compare(lastKey, key) == 0); checkKey(key); boolean largeEnoughInterval = size % indexInterval == 0; if (largeEnoughInterval && !equalsLastKey) { // add an index entry position.set(data.getLength()); // point to current eof index.append(key, position); } data.append(key, val); // append key/value to data if (!largeEnoughInterval || !equalsLastKey) size++; } (The size variable should then be renamed to something more accurate.) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.