[
https://issues.apache.org/jira/browse/HBASE-29864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Himanshu Gwalani updated HBASE-29864:
-------------------------------------
Description:
Currently, StoreFileScanner uses
@InterfaceAudience.LimitedPrivate(HBaseInterfaceAudience.PHOENIX)
while the KeyValueScanner interface and all other implementations use
@InterfaceAudience.Private.
This inconsistency should be addressed for API clarity and maintainability.
**Current State:**
- KeyValueScanner (interface): @InterfaceAudience.Private ⚠️
- StoreFileScanner:
@InterfaceAudience.LimitedPrivate(HBaseInterfaceAudience.PHOENIX) ✓
- All other implementations: @InterfaceAudience.Private ⚠️
**Proposed Change:**
Change the KeyValueScanner interface and ALL implementations to:
@InterfaceAudience.LimitedPrivate(HBaseInterfaceAudience.PHOENIX)
**Rationale:**
1. Phoenix requires direct access to scanner implementations (StoreFileScanner
already has this)
2. Phoenix code references implementations through the KeyValueScanner
interface type, so the interface must also be LimitedPrivate(PHOENIX) for
consistency
3. All implementations should have the same audience annotation as the
interface for maintainability
4. Coprocessors access scanners through RegionScanner (LimitedPrivate for
COPROC), not directly through KeyValueScanner
was:
*Goal:* Introduce a mechanism to track and expose the specific HFiles involved
in a scan operation.
{*}Use-case{*}: This is essential for validations on client side to ensure
right set of files are scanned (if source of truth is available, for example:
snapshot data manifest during snapshot based scans), debugging performance
related issues and analysis on data access patterns.
*Proposed API* Add {{Set<Path> getScannerInitializedFiles()}} to the
{{KeyValueScanner}} interface.
*Implementation Details*
* *Capturing list of files when scanner is initialized.*
** Leaf Scanners
*** StoreFileScanner: Returns singleton having the path of the associated
{{{}HFile{}}}.
*** SnapshotSegmentScanner / CollectionBackedScanner / SegmentScanner: Returns
empty set.
** Composite Scanners
*** StoreScanner & ReversedStoreScanner: Aggregates files from all active
{{StoreFileScanners}}
*** KeyValueHeap & ReversedKeyValueHeap: Aggregates files from its internal
priority queue of scanners.
** Abstract Scanners
*** NonLazyKeyValueScanner / NonReversedNonLazyKeyValueScanner: Returns empty
set.{*}{{*}}
* *Exposing via RegionScanner & TableSnapshotRecordReader*
** RegionScanner: Aggregates files from all underlying StoreScanners
** TableSnapshotRecordReader: Proxies the call through ClientSideRegionScanner
to allow MapReduce jobs to access this for snapshot-based scans.
** Note: Also
> Standardize KeyValueScanner interface and all implementations to
> LimitedPrivate
> -------------------------------------------------------------------------------
>
> Key: HBASE-29864
> URL: https://issues.apache.org/jira/browse/HBASE-29864
> Project: HBase
> Issue Type: New Feature
> Components: API, regionserver, Scanners
> Reporter: Himanshu Gwalani
> Assignee: Himanshu Gwalani
> Priority: Major
> Fix For: 2.7.0, 3.0.0-beta-2
>
>
> Currently, StoreFileScanner uses
> @InterfaceAudience.LimitedPrivate(HBaseInterfaceAudience.PHOENIX)
> while the KeyValueScanner interface and all other implementations use
> @InterfaceAudience.Private.
> This inconsistency should be addressed for API clarity and maintainability.
> **Current State:**
> - KeyValueScanner (interface): @InterfaceAudience.Private ⚠️
> - StoreFileScanner:
> @InterfaceAudience.LimitedPrivate(HBaseInterfaceAudience.PHOENIX) ✓
> - All other implementations: @InterfaceAudience.Private ⚠️
> **Proposed Change:**
> Change the KeyValueScanner interface and ALL implementations to:
> @InterfaceAudience.LimitedPrivate(HBaseInterfaceAudience.PHOENIX)
> **Rationale:**
> 1. Phoenix requires direct access to scanner implementations
> (StoreFileScanner already has this)
> 2. Phoenix code references implementations through the KeyValueScanner
> interface type, so the interface must also be LimitedPrivate(PHOENIX) for
> consistency
> 3. All implementations should have the same audience annotation as the
> interface for maintainability
> 4. Coprocessors access scanners through RegionScanner (LimitedPrivate for
> COPROC), not directly through KeyValueScanner
--
This message was sent by Atlassian Jira
(v8.20.10#820010)