[
https://issues.apache.org/jira/browse/HDFS-6698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14192993#comment-14192993
]
Lars Hofhansl commented on HDFS-6698:
-------------------------------------
Now... I am not saying that we do not have work to in HBase:
* we're using one reader per HFile
* after a major compaction we have a single store file per column family (that
file can be up to 20GB in size)
* we allow one thread using seek+read on that reader, other concurrent scanners
will fall back to pread (see HBASE-7336).
For my test I did this:
* my test table had 2^25 (~32m) rows, in two regions, about 1GB on disk
* I tested this with Phoenix, which can break a query into parts and execute
scans for the parts (that's where the parallel scanning on the same readers
comes into play)
* I have short circuit reading enabled
* all data in the OS cache (HBase block cache not used)
This is not an uncommon scenario, though. The original poster cited
scans(seek+read) + gets(pread) as a problem.
In either case, I'll post an updated patch to HDFS-6735 and we can take it from
there.
> try to optimize DFSInputStream.getFileLength()
> ----------------------------------------------
>
> Key: HDFS-6698
> URL: https://issues.apache.org/jira/browse/HDFS-6698
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Components: hdfs-client
> Affects Versions: 3.0.0
> Reporter: Liang Xie
> Assignee: Liang Xie
> Attachments: HDFS-6698.txt, HDFS-6698.txt, HDFS-6698v2.txt,
> HDFS-6698v2.txt, HDFS-6698v3.txt
>
>
> HBase prefers to invoke read() serving scan request, and invoke pread()
> serving get reqeust. Because pread() almost holds no lock.
> Let's image there's a read() running, because the definition is:
> {code}
> public synchronized int read
> {code}
> so no other read() request could run concurrently, this is known, but pread()
> also could not run... because:
> {code}
> public int read(long position, byte[] buffer, int offset, int length)
> throws IOException {
> // sanity checks
> dfsClient.checkOpen();
> if (closed) {
> throw new IOException("Stream closed");
> }
> failures = 0;
> long filelen = getFileLength();
> {code}
> the getFileLength() also needs lock. so we need to figure out a no lock impl
> for getFileLength() before HBase multi stream feature done.
> [[email protected]]
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)