catalin-luca commented on pull request #2002:
URL: https://github.com/apache/hbase/pull/2002#issuecomment-655025128
> Can you point out where you see this in LoadIncrementalHFiles and how this
current proposal avoid that, since it basically call
LoadIncrementalHFiles.doBulkloadFromQueue at the end of each map task?
This is the part that opens each HFile to obtain the start end key :
```
HFile.Reader hfr = HFile.createReader(fs, hfilePath,
new CacheConfig(getConf()), getConf());
final byte[] first, last;
try {
hfr.loadFileInfo();
first = hfr.getFirstRowKey();
last = hfr.getLastRowKey();
} finally {
hfr.close();
}
```
Particularly, `HFile.createReader` opens the HFile and loads the file
trailer.
When running HBase on top of S3 these calls are an order of magnitude larger
in latency.
The calls themselves are not the problem (as they are needed to determine
the region server that will receive the file).
The problem is the large latency that causes the overall bulkload process to
take very long.
My first instinct was to hide this latency by increasing the parallelism of
LoadIncrementalHFiles. However, going beyond ~500-600 threads did not yield any
improvement. After inspecting thread dumps, I saw lots of time spent in
re-creating HTTP connections. It seemed that the connections were not being
re-used because the `HFile.Reader` was not reading all the bytes after seeking
to read the trailer. In turn this causes the connection to get aborted and it
can't be pooled.
After running the LoadIncrementalHFiles code in multiple processes using
map/reduce I was to able to achieve an overall larger parallelism larger than
the 500-600 mentioned above. The connections are still getting aborted, but the
overall process can be scaled horizontally better.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]