Re: HBase implementation question

Stefan Groschupf Wed, 02 Jan 2008 03:46:15 -0800

Hi,

Reads are probably a bit more complicated than writes. A read
operation first checks the cache and may satisfy the request
directly from the cache. If not, the operation checks the
newest MapFile for the data, then the next to newest, ...,
to the oldest stopping when the requested data has been
retrieved. Because a random read (or even a sequential read
that is not a scan) can end up checking multiple files
for data they are considerably slower than either writes and
sequential scans (think of a scan as working with a cursor
in a traditional database).

Sorry, just to double check I understand it correctly. The number offiles need to be checked for a read is related to the compactionthreshold, since all files are merged into one big sorted file after agiven time by the compaction thread?

Any idea how many files usually need to checked in average?

Would it make any sense here to work with key-spaces similar to themap/reduce partitioner to keep the number of files that need to beread smaller?


Thanks,
Stefan

Re: HBase implementation question

Reply via email to