[
https://issues.apache.org/jira/browse/HBASE-29863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated HBASE-29863:
-----------------------------------
Labels: pull-request-available (was: )
> Add API to KeyValueScanner to retrieve the set of StoreFiles accessed during
> a scan
> -----------------------------------------------------------------------------------
>
> Key: HBASE-29863
> URL: https://issues.apache.org/jira/browse/HBASE-29863
> Project: HBase
> Issue Type: New Feature
> Components: API, regionserver, Scanners
> Reporter: Himanshu Gwalani
> Assignee: Himanshu Gwalani
> Priority: Major
> Labels: pull-request-available
> Fix For: 2.7.0, 3.0.0-beta-2
>
>
> *Goal:* Introduce a mechanism to track and expose the specific HFiles
> involved in a scan operation.
> {*}Use-case{*}: This is essential for validations on client side to ensure
> right set of files are scanned (if source of truth is available, for example:
> snapshot data manifest during snapshot based scans), debugging performance
> related issues and analysis on data access patterns.
> *Proposed API* Add {{Set<Path> getScannerInitializedFiles()}} to the
> {{KeyValueScanner}} interface.
> *Implementation Details*
> * *Capturing list of files when scanner is initialized.*
> ** Leaf Scanners
> *** StoreFileScanner: Returns singleton having the path of the associated
> {{{}HFile{}}}.
> *** SnapshotSegmentScanner / CollectionBackedScanner / SegmentScanner:
> Returns empty set.
> ** Composite Scanners
> *** StoreScanner & ReversedStoreScanner: Aggregates files from all active
> {{StoreFileScanners}}
> *** KeyValueHeap & ReversedKeyValueHeap: Aggregates files from its internal
> priority queue of scanners.
> ** Abstract Scanners
> *** NonLazyKeyValueScanner / NonReversedNonLazyKeyValueScanner: Returns
> empty set.{*}{{*}}
> * *Exposing via RegionScanner & TableSnapshotRecordReader*
> ** RegionScanner: Aggregates files from all underlying StoreScanners
> ** TableSnapshotRecordReader: Proxies the call through
> ClientSideRegionScanner to allow MapReduce jobs to access this for
> snapshot-based scans.
> ** Note: Also
--
This message was sent by Atlassian Jira
(v8.20.10#820010)