[ 
https://issues.apache.org/jira/browse/HBASE-26273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17414634#comment-17414634
 ] 

Josh Elser commented on HBASE-26273:
------------------------------------

bq. One thing I am not sure is that in our case, most of snapshot read is 
through SCR (local read), even if it reads LEAF_INDEX block a lot, these blocks 
are probably cached in OS's buffer cache and it should not cause excessive disk 
IO. I will spend some time to figure out what is going on there.

Hrm, yeah, that's an interesting point. I agree with your point that page cache 
should help. In our experiments, I believe we could see a bunch of DN 
connections happening (meaning no SCR), but we didn't look super-closely at why 
that was as we thought it was a red-herring.

I know we have some HDFS locality checks in this SnapshotInputFormat codebase, 
but I don't think Stephen or I (in the course of this work) dug deep into 
whether they were working correctly.

> TableSnapshotInputFormat/TableSnapshotInputFormatImpl should use 
> ReadType.STREAM for scanning HFiles 
> -----------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-26273
>                 URL: https://issues.apache.org/jira/browse/HBASE-26273
>             Project: HBase
>          Issue Type: Improvement
>          Components: mapreduce
>    Affects Versions: 3.0.0-alpha-1, 2.4.6
>            Reporter: Tak-Lon (Stephen) Wu
>            Assignee: Josh Elser
>            Priority: Major
>
> After the change in HBASE-17917 that use PREAD ({{ReadType.DEFAULT}}) for all 
> user scan, the behavior of TableSnapshotInputFormat changed from STREAM to 
> PREAD.
> TableSnapshotInputFormat is supposed to be use with a YARN/MR or other batch 
> engine that should read the entire HFile in the container/executor, with 
> default always to PREAD, we executing a lot more DFSInputStream#seek calls to 
> simply read through the datablock section of the HFile.
> The goal of this change is to make any downstream using 
> TableSnapshotInputFormat with STREAM scan.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to