[jira] [Updated] (HBASE-26273) TableSnapshotInputFormat/TableSnapshotInputFormatImpl should use ReadType.STREAM for scanning HFiles

Josh Elser (Jira) Mon, 13 Sep 2021 16:40:06 -0700


     [ 
https://issues.apache.org/jira/browse/HBASE-26273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Josh Elser updated HBASE-26273:
-------------------------------
    Release Note: HBase's MapReduce API which can operate over HBase snapshots 
will now default to using ReadType.STREAM instead of ReadType.DEFAULT (which is 
PREAD) as a result of this change. HBase developers expect that STREAM will 
perform significantly better for average Snapshot-based batch jobs. Users can 
restore the previous functionality (using PREAD) by updating their code to 
explicitly set a value of `ReadType.PREAD` on the `Scan` object they provide to 
TableSnapshotInputFormat, or by setting the configuration property 
"hbase.TableSnapshotInputFormat.scanner.readtype" to "PREAD" in hbase-site.xml. 

> TableSnapshotInputFormat/TableSnapshotInputFormatImpl should use 
> ReadType.STREAM for scanning HFiles 
> -----------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-26273
>                 URL: https://issues.apache.org/jira/browse/HBASE-26273
>             Project: HBase
>          Issue Type: Improvement
>          Components: mapreduce
>    Affects Versions: 3.0.0-alpha-1, 2.4.6
>            Reporter: Tak-Lon (Stephen) Wu
>            Assignee: Josh Elser
>            Priority: Major
>             Fix For: 2.5.0, 3.0.0-alpha-2, 2.4.7
>
>
> After the change in HBASE-17917 that use PREAD ({{ReadType.DEFAULT}}) for all 
> user scan, the behavior of TableSnapshotInputFormat changed from STREAM to 
> PREAD.
> TableSnapshotInputFormat is supposed to be use with a YARN/MR or other batch 
> engine that should read the entire HFile in the container/executor, with 
> default always to PREAD, we executing a lot more DFSInputStream#seek calls to 
> simply read through the datablock section of the HFile.
> The goal of this change is to make any downstream using 
> TableSnapshotInputFormat with STREAM scan.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HBASE-26273) TableSnapshotInputFormat/TableSnapshotInputFormatImpl should use ReadType.STREAM for scanning HFiles

Reply via email to