[
https://issues.apache.org/jira/browse/HBASE-26273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17413325#comment-17413325
]
Tak-Lon (Stephen) Wu edited comment on HBASE-26273 at 9/10/21, 6:12 PM:
------------------------------------------------------------------------
Hi [~huaxiangsun], sorry for the confusion but let me clarify it.
{code:java}
org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:644)
org.apache.hadoop.hdfs.DFSInputStream.actualGetFromOneDataNode(DFSInputStream.java:1046)
org.apache.hadoop.hdfs.DFSInputStream.fetchBlockByteRange(DFSInputStream.java:998)
org.apache.hadoop.hdfs.DFSInputStream.pread(DFSInputStream.java:1357)
org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:1321)
org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:92)
org.apache.hadoop.hbase.io.FileLink$FileLinkInputStream.read(FileLink.java:171)
org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:92)
org.apache.hadoop.hbase.io.hfile.HFileBlock.positionalReadWithExtra(HFileBlock.java:808)
{code}
Above is partial stacktrace that {{HFileBlock#positionalReadWithExtra (PREAD)}}
that lands with HDFS's {{DFSInputStream#actualGetFromOneDataNode}} and further
use the {{BlockReader}} fromĀ {{DFSInputStream#getBlockReader}} to fetch the
block, and each time this {{BlockReader}} is a new object. Compared to the
STREAM read, [{{DFSInputStream#readWithStrategy}} reuses the same
{{BlockReader}} object from the
{{currentNode}}|https://github.com/apache/hadoop/blob/618c9218eeed2dc0388010e04349b3df8d6c5b70/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java#L893]
for the life when reading the same HFile.
there is two level as Josh points out
1. That's the reason of this JIRA, the number of {{BlockReader}} objects with
using PREAD is high because of how {{DFSInputStream#actualGetFromOneDataNode}}
works that [keeps creating new BlockReader
object|https://github.com/apache/hadoop/blob/618c9218eeed2dc0388010e04349b3df8d6c5b70/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java#L1199].
and in the use case of TableSnapshotInputFormat, we're scanning the entire
HFile and should not need to create a lot of {{BlockReader}} objects. and when
I said connection it's because we found a lot of lines trigger by
{{DFSClient.LOG.debug("Connecting to datanode {}", dnAddr);}} and when the data
block is not local it will use {{getRemoteBlockReaderFromTcp}} (where I think
it performs the {{OP.READ_BLOCK}} once with the remote data node to confirm if
the block exist?), sorry if I used a wrong term of `connection`
2. the number of read increase because of HBASE-26274, basically we keep PREAD
to the LEAF_INDEX without blockcache within the MR container, and made a lot or
more `connection`/data read to HDFS.
was (Author: taklwu):
Hi [~huaxiangsun], sorry for the confusion but let me clarify it.
{code:java}
org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:644)
org.apache.hadoop.hdfs.DFSInputStream.actualGetFromOneDataNode(DFSInputStream.java:1046)
org.apache.hadoop.hdfs.DFSInputStream.fetchBlockByteRange(DFSInputStream.java:998)
org.apache.hadoop.hdfs.DFSInputStream.pread(DFSInputStream.java:1357)
org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:1321)
org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:92)
org.apache.hadoop.hbase.io.FileLink$FileLinkInputStream.read(FileLink.java:171)
org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:92)
org.apache.hadoop.hbase.io.hfile.HFileBlock.positionalReadWithExtra(HFileBlock.java:808)
{code}
Above is partial stacktrace that {{HFileBlock#positionalReadWithExtra (PREAD)}}
that lands with HDFS's {{DFSInputStream#actualGetFromOneDataNode}} and further
use the {{BlockReader}} fromĀ {{DFSInputStream#getBlockReader}} to fetch the
block, and each time this {{BlockReader}} is a new object. Compared to the
STREAM read, [{{DFSInputStream#readWithStrategy}} reuses the same
{{BlockReader}} object from the
{{currentNode}}|https://github.com/apache/hadoop/blob/618c9218eeed2dc0388010e04349b3df8d6c5b70/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java#L893]
for the life when reading the same HFile.
there is two level as Josh points out
1. That's the reason of this JIRA, the number of {{BlockReader}} objects with
using PREAD is high because of how {{DFSInputStream#actualGetFromOneDataNode}}
works that [keeps creating new BlockReader
object|https://github.com/apache/hadoop/blob/618c9218eeed2dc0388010e04349b3df8d6c5b70/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java#L1199].
and in the use case of TableSnapshotInputFormat, we're scanning the entire
HFile and should not need to create a lot of {{BlockReader}} objects. and when
I said connection it's because we found a lot of lines trigger by
{{DFSClient.LOG.debug("Connecting to datanode {}", dnAddr);}} and when the data
block is not local it will use {{getRemoteBlockReaderFromTcp}} (where I think
it performs the {{OP.READ_BLOCK}} once with the remote data node to confirm if
the block exist?), sorry if I used a wrong term of `connection`
2. the number of read increase because of HBASE-26274, basically we keep PREAD
to the LEAF_INDEX without blockcache within the MR container, and made a lot or
more `connection` to HDFS.
> TableSnapshotInputFormat/TableSnapshotInputFormatImpl should use
> ReadType.STREAM for scanning HFiles
> -----------------------------------------------------------------------------------------------------
>
> Key: HBASE-26273
> URL: https://issues.apache.org/jira/browse/HBASE-26273
> Project: HBase
> Issue Type: Improvement
> Components: mapreduce
> Affects Versions: 3.0.0-alpha-1, 2.4.6
> Reporter: Tak-Lon (Stephen) Wu
> Assignee: Josh Elser
> Priority: Major
>
> After the change in HBASE-17917 that use PREAD ({{ReadType.DEFAULT}}) for all
> user scan, the behavior of TableSnapshotInputFormat changed from STREAM to
> PREAD.
> TableSnapshotInputFormat is supposed to be use with a YARN/MR or other batch
> engine that should read the entire HFile in the container/executor, with
> default always to PREAD, we executing a lot more DFSInputStream#seek calls to
> simply read through the datablock section of the HFile.
> The goal of this change is to make any downstream using
> TableSnapshotInputFormat with STREAM scan.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)