Albert Chern wrote:
Every time the size of the map file hits a multiple of the index
interval, an index entry is written. Therefore, it is possible that
an index entry is not added for the first occurrence of a key, but one
of the later ones. The reader will then seek to one of those instead
of the first.
This does seem to be inconsistent with the the fact that you are
allowed to insert equal key records.
Yes, I agree that this is confusing and arguably a bug.
I suspect perhaps the developers
meant for MapFile records to be uniquely keyed, but in
MapFile.Writer.checkKey() they used a > where they intended a >= or
something.
I think what actually happened was that I originally coded it to
prohibit equal keys, then, at some point found an application (somewhere
in Nutch) where equal keys were useful, and changed MapFile to support
them, not realizing the consequences. Sigh. I don't know whether Nutch
still relies on this or not.
MapFile could probably be fixed by changing the way the index is
created, to write the location of the first instance of any run of equal
keys. We could also avoid recording two instances of equal keys in the
index: for a long run of equal keys, we could wait until the key changes
before emitting a new index entry.
Doug