[ 
https://issues.apache.org/jira/browse/HBASE-29863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HBASE-29863:
-----------------------------------
    Labels: pull-request-available  (was: )

> Add API to KeyValueScanner to retrieve the set of StoreFiles accessed during 
> a scan
> -----------------------------------------------------------------------------------
>
>                 Key: HBASE-29863
>                 URL: https://issues.apache.org/jira/browse/HBASE-29863
>             Project: HBase
>          Issue Type: New Feature
>          Components: API, regionserver, Scanners
>            Reporter: Himanshu Gwalani
>            Assignee: Himanshu Gwalani
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 2.7.0, 3.0.0-beta-2
>
>
> *Goal:* Introduce a mechanism to track and expose the specific HFiles 
> involved in a scan operation.
> {*}Use-case{*}: This is essential for validations on client side to ensure 
> right set of files are scanned (if source of truth is available, for example: 
> snapshot data manifest during snapshot based scans), debugging performance 
> related issues and analysis on data access patterns.
> *Proposed API* Add {{Set<Path> getScannerInitializedFiles()}} to the 
> {{KeyValueScanner}} interface.
> *Implementation Details*
>  * *Capturing list of files when scanner is initialized.*
>  ** Leaf Scanners
>  *** StoreFileScanner: Returns singleton having the path of the associated 
> {{{}HFile{}}}.
>  *** SnapshotSegmentScanner / CollectionBackedScanner / SegmentScanner: 
> Returns empty set.
>  ** Composite Scanners
>  *** StoreScanner & ReversedStoreScanner: Aggregates files from all active 
> {{StoreFileScanners}}
>  *** KeyValueHeap & ReversedKeyValueHeap: Aggregates files from its internal 
> priority queue of scanners.
>  ** Abstract Scanners
>  *** NonLazyKeyValueScanner / NonReversedNonLazyKeyValueScanner: Returns 
> empty set.{*}{{*}}
>  * *Exposing via RegionScanner & TableSnapshotRecordReader*
>  ** RegionScanner: Aggregates files from all underlying StoreScanners
>  ** TableSnapshotRecordReader: Proxies the call through 
> ClientSideRegionScanner to allow MapReduce jobs to access this for 
> snapshot-based scans.
>  ** Note: Also 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to