[jira] [Comment Edited] (HBASE-26273) TableSnapshotInputFormat/TableSnapshotInputFormatImpl should use ReadType.STREAM for scanning HFiles

Tak-Lon (Stephen) Wu (Jira) Fri, 10 Sep 2021 11:13:06 -0700


    [ 
https://issues.apache.org/jira/browse/HBASE-26273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17413325#comment-17413325
 ]


Tak-Lon (Stephen) Wu edited comment on HBASE-26273 at 9/10/21, 6:12 PM:
------------------------------------------------------------------------

Hi [~huaxiangsun], sorry for the confusion but let me clarify it.
{code:java}
org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:644)
org.apache.hadoop.hdfs.DFSInputStream.actualGetFromOneDataNode(DFSInputStream.java:1046)
org.apache.hadoop.hdfs.DFSInputStream.fetchBlockByteRange(DFSInputStream.java:998)
org.apache.hadoop.hdfs.DFSInputStream.pread(DFSInputStream.java:1357)
org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:1321)
org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:92)
org.apache.hadoop.hbase.io.FileLink$FileLinkInputStream.read(FileLink.java:171)
org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:92)
org.apache.hadoop.hbase.io.hfile.HFileBlock.positionalReadWithExtra(HFileBlock.java:808)

{code}
Above is partial stacktrace that {{HFileBlock#positionalReadWithExtra (PREAD)}} 
that lands with HDFS's {{DFSInputStream#actualGetFromOneDataNode}} and further 
use the {{BlockReader}} from  {{DFSInputStream#getBlockReader}} to fetch the 
block, and each time this {{BlockReader}} is a new object. Compared to the 
STREAM read, [{{DFSInputStream#readWithStrategy}} reuses the same 
{{BlockReader}} object from the 
{{currentNode}}|https://github.com/apache/hadoop/blob/618c9218eeed2dc0388010e04349b3df8d6c5b70/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java#L893]
 for the life when reading the same HFile.

there is two level as Josh points out

1. That's the reason of this JIRA, the number of {{BlockReader}} objects with 
using PREAD is high because of how {{DFSInputStream#actualGetFromOneDataNode}} 
works that [keeps creating new BlockReader 
object|https://github.com/apache/hadoop/blob/618c9218eeed2dc0388010e04349b3df8d6c5b70/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java#L1199].
 and in the use case of TableSnapshotInputFormat, we're scanning the entire 
HFile and should not need to create a lot of {{BlockReader}} objects. and when 
I said connection it's because we found a lot of lines trigger by 
{{DFSClient.LOG.debug("Connecting to datanode {}", dnAddr);}} and when the data 
block is not local it will use {{getRemoteBlockReaderFromTcp}} (where I think 
it performs the {{OP.READ_BLOCK}} once with the remote data node to confirm if 
the block exist?), sorry if I used a wrong term of `connection`
 2. the number of read increase because of HBASE-26274, basically we keep PREAD 
to the LEAF_INDEX without blockcache within the MR container, and made a lot or 
more `connection`/data read to HDFS.


was (Author: taklwu):
Hi [~huaxiangsun], sorry for the confusion but let me clarify it.
{code:java}
org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:644)
org.apache.hadoop.hdfs.DFSInputStream.actualGetFromOneDataNode(DFSInputStream.java:1046)
org.apache.hadoop.hdfs.DFSInputStream.fetchBlockByteRange(DFSInputStream.java:998)
org.apache.hadoop.hdfs.DFSInputStream.pread(DFSInputStream.java:1357)
org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:1321)
org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:92)
org.apache.hadoop.hbase.io.FileLink$FileLinkInputStream.read(FileLink.java:171)
org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:92)
org.apache.hadoop.hbase.io.hfile.HFileBlock.positionalReadWithExtra(HFileBlock.java:808)

{code}
Above is partial stacktrace that {{HFileBlock#positionalReadWithExtra (PREAD)}} 
that lands with HDFS's {{DFSInputStream#actualGetFromOneDataNode}} and further 
use the {{BlockReader}} from  {{DFSInputStream#getBlockReader}} to fetch the 
block, and each time this {{BlockReader}} is a new object. Compared to the 
STREAM read, [{{DFSInputStream#readWithStrategy}} reuses the same 
{{BlockReader}} object from the 
{{currentNode}}|https://github.com/apache/hadoop/blob/618c9218eeed2dc0388010e04349b3df8d6c5b70/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java#L893]
 for the life when reading the same HFile.

there is two level as Josh points out

1. That's the reason of this JIRA, the number of {{BlockReader}} objects with 
using PREAD is high because of how {{DFSInputStream#actualGetFromOneDataNode}} 
works that [keeps creating new BlockReader 
object|https://github.com/apache/hadoop/blob/618c9218eeed2dc0388010e04349b3df8d6c5b70/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java#L1199].
 and in the use case of TableSnapshotInputFormat, we're scanning the entire 
HFile and should not need to create a lot of {{BlockReader}} objects. and when 
I said connection it's because we found a lot of lines trigger by 
{{DFSClient.LOG.debug("Connecting to datanode {}", dnAddr);}} and when the data 
block is not local it will use {{getRemoteBlockReaderFromTcp}} (where I think 
it performs the {{OP.READ_BLOCK}} once with the remote data node to confirm if 
the block exist?), sorry if I used a wrong term of `connection`
 2. the number of read increase because of HBASE-26274, basically we keep PREAD 
to the LEAF_INDEX without blockcache within the MR container, and made a lot or 
more `connection` to HDFS.

> TableSnapshotInputFormat/TableSnapshotInputFormatImpl should use 
> ReadType.STREAM for scanning HFiles 
> -----------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-26273
>                 URL: https://issues.apache.org/jira/browse/HBASE-26273
>             Project: HBase
>          Issue Type: Improvement
>          Components: mapreduce
>    Affects Versions: 3.0.0-alpha-1, 2.4.6
>            Reporter: Tak-Lon (Stephen) Wu
>            Assignee: Josh Elser
>            Priority: Major
>
> After the change in HBASE-17917 that use PREAD ({{ReadType.DEFAULT}}) for all 
> user scan, the behavior of TableSnapshotInputFormat changed from STREAM to 
> PREAD.
> TableSnapshotInputFormat is supposed to be use with a YARN/MR or other batch 
> engine that should read the entire HFile in the container/executor, with 
> default always to PREAD, we executing a lot more DFSInputStream#seek calls to 
> simply read through the datablock section of the HFile.
> The goal of this change is to make any downstream using 
> TableSnapshotInputFormat with STREAM scan.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (HBASE-26273) TableSnapshotInputFormat/TableSnapshotInputFormatImpl should use ReadType.STREAM for scanning HFiles

Reply via email to