Albert Chern wrote:
Every time the size of the map file hits a multiple of the index
interval, an index entry is written.  Therefore, it is possible that
an index entry is not added for the first occurrence of a key, but one
of the later ones.  The reader will then seek to one of those instead
of the first.

This does seem to be inconsistent with the the fact that you are
allowed to insert equal key records.

Yes, I agree that this is confusing and arguably a bug.

I suspect perhaps the developers
meant for MapFile records to be uniquely keyed, but in
MapFile.Writer.checkKey() they used a > where they intended a >= or
something.

I think what actually happened was that I originally coded it to prohibit equal keys, then, at some point found an application (somewhere in Nutch) where equal keys were useful, and changed MapFile to support them, not realizing the consequences. Sigh. I don't know whether Nutch still relies on this or not.

MapFile could probably be fixed by changing the way the index is created, to write the location of the first instance of any run of equal keys. We could also avoid recording two instances of equal keys in the index: for a long run of equal keys, we could wait until the key changes before emitting a new index entry.

Doug

Reply via email to