Hi,
Reads are probably a bit more complicated than writes. A read
operation first checks the cache and may satisfy the request
directly from the cache. If not, the operation checks the
newest MapFile for the data, then the next to newest, ...,
to the oldest stopping when the requested data has been
retrieved. Because a random read (or even a sequential read
that is not a scan) can end up checking multiple files
for data they are considerably slower than either writes and
sequential scans (think of a scan as working with a cursor
in a traditional database).
Sorry, just to double check I understand it correctly. The number of
files need to be checked for a read is related to the compaction
threshold, since all files are merged into one big sorted file after a
given time by the compaction thread?
Any idea how many files usually need to checked in average?
Would it make any sense here to work with key-spaces similar to the
map/reduce partitioner to keep the number of files that need to be
read smaller?
Thanks,
Stefan