Hello Andy,

>> No, definitely not full object reads, we use HDFS positioned reads, which 
>> allow us to request, within a gigabyte plus store file, much smaller byte >> 
>> ranges (e.g. 64 KB), and receive back only the requested data. We can "seek" 
>> around the file.

Ahh. This is good to know. HTTP range requests should work for this mode of 
operation. I will take a look at Hadoop's S3 FileSystemStore implementation and 
see if it uses HTTP range requests.

>> Aside from several IMHO showstopper performance problems, the shortest 
>> answer is HBase often wants to promptly read back store files it has
>> written, and S3 is too eventual often enough (transient 404s or 500s) to 
>> preclude reliable operation.

Hmm. OK. The potential performance problems are worrisome.

Improvements in Hadoop's S3 client, and in the implementation of S3 itself 
could help to fix throughput problems and mask transient error problems. There 
are rumors of a version of the Hadoop S3 client implementation that use 
parallel reads to greatly improve throughput.

Andy - are you (or other HBase experts) aware if HBase would have problems with 
a HFile store that exhibits variable latency? Specifically, what about 
scenarios where most HFile reads come back in milliseconds, but suddenly there 
is one that takes a few hundred milliseconds (or more).

Thanks,
Jagane


Reply via email to