Andrew Purtell created HBASE-20430:

             Summary: Improve store file management for non-HDFS filesystems
                 Key: HBASE-20430
             Project: HBase
          Issue Type: Sub-task
            Reporter: Andrew Purtell

HBase keeps a file open for every active store file so no additional round 
trips to the NameNode are needed after the initial open. HDFS internally 
multiplexes open files, but the Hadoop S3 filesystem implementations do not, 
or, at least, not as well. As the bulk of data under management increases we 
observe the required number of concurrently open connections will rise, and 
expect it will eventually exhaust a limit somewhere (the client, the OS file 
descriptor table or open file limits, or the S3 service).

Initially we can simply introduce an option to close every store file after the 
reader has finished, and determine the performance impact. Use cases backed by 
non-HDFS filesystems will already have to cope with a different read 
performance profile. Based on experiments with the S3 backed Hadoop 
filesystems, notably S3A, even with aggressively tuned options simple reads can 
be very slow when there are blockcache misses, 15-20 seconds observed for Get 
of a single small row, for example. We expect extensive use of the BucketCache 
to mitigate in this application already. Could be backed by offheap storage, 
but more likely a large number of cache files managed by the file engine on 
local SSD storage. If misses are already going to be super expensive, then the 
motivation to do more than simply open store files on demand is largely absent.

Still, we could employ a predictive cache. Where frequent access to a given 
store file (or, at least, its store) is predicted, keep a reference to the 
store file open. Can keep statistics about read frequency, write it out to 
HFiles during compaction, and note these stats when opening the region, perhaps 
by reading all meta blocks of region HFiles when opening. Otherwise, close the 
file after reading and open again on demand. Need to be careful not to use ARC 
or equivalent as cache replacement strategy as it is encumbered. The size of 
the cache can be determined at startup after detecting the underlying 
filesystem. Eg. setCacheSize(VERY_LARGE_CONSTANT) if (fs instanceof 
DistributedFileSystem), so we don't lose much when on HDFS still.

This message was sent by Atlassian JIRA

Reply via email to