Re: Multi get/put

Ning Li Tue, 26 Aug 2008 15:55:38 -0700

Some follow-up on the performance issues:

> > PERFORMANCE ISSUES
> > Our preliminary performance experiments show that the performance
> > of building an index is quite reasonable. However, the performance of
> > random reads in HDFS is so poor that the search performance is
> > dramatically worse than that on local file systems.
> >
> What do you mean by 'dramatic' in the above?  This is a sweet feature.  That
> its slow on first implementation is OK.  Are you thinking its so slow, its
> not functional?


On local FS, real disk IO is expensive. Lucene relies on FS cache to
provide high search performance on local FS. Because of this, the
following comparisons are based on warm test results.

The comparison is between the local FS and a one-node HDFS. HDFS
provides high sequential read performance but poor random read
performance mainly because of socket overhead when data is warm.

On HDFS 0.17.1, the search performance is more than an order of
magnitude slower than that on a local FS. Even with reusing socket
connection, the search performance is still about an order of
magnitude slower.

Since this is caused by the socket overhead in HDFS, you see similar
results with random reads on a map file. I used HBase's
MapFilePerformanceEvaluation. The random read performance is a bit
less than 7 times lower than that on a local FS. This is a bit better
than the search performance probably because a random read on a map
file is several almost-sequential reads on the data file in HDFS.

Given the above, would the search performance be acceptable?

PS: I saw on http://wiki.apache.org/hadoop/Hbase/PerformanceEvaluation
that the random read performance on a map file improved quite a bit
from 0.17.1 to 0.18.0. Any insight?

Re: Multi get/put

Reply via email to