[ 
https://issues.apache.org/jira/browse/HBASE-29864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Himanshu Gwalani updated HBASE-29864:
-------------------------------------
    Description: 
Currently, StoreFileScanner uses 
@InterfaceAudience.LimitedPrivate(HBaseInterfaceAudience.PHOENIX) 
while the KeyValueScanner interface and all other implementations use 
@InterfaceAudience.Private. 
This inconsistency should be addressed for API clarity and maintainability.

**Current State:**
- KeyValueScanner (interface): @InterfaceAudience.Private ⚠️
- StoreFileScanner: 
@InterfaceAudience.LimitedPrivate(HBaseInterfaceAudience.PHOENIX) ✓
- All other implementations: @InterfaceAudience.Private ⚠️

**Proposed Change:**
Change the KeyValueScanner interface and ALL implementations to:
@InterfaceAudience.LimitedPrivate(HBaseInterfaceAudience.PHOENIX)

**Rationale:**
1. Phoenix requires direct access to scanner implementations (StoreFileScanner 
already has this)
2. Phoenix code references implementations through the KeyValueScanner 
interface type, so the interface must also be LimitedPrivate(PHOENIX) for 
consistency
3. All implementations should have the same audience annotation as the 
interface for maintainability
4. Coprocessors access scanners through RegionScanner (LimitedPrivate for 
COPROC), not directly through KeyValueScanner

  was:
*Goal:* Introduce a mechanism to track and expose the specific HFiles involved 
in a scan operation.

{*}Use-case{*}: This is essential for validations on client side to ensure 
right set of files are scanned (if source of truth is available, for example: 
snapshot data manifest during snapshot based scans), debugging performance 
related issues and analysis on data access patterns.

*Proposed API* Add {{Set<Path> getScannerInitializedFiles()}} to the 
{{KeyValueScanner}} interface.

*Implementation Details*
 * *Capturing list of files when scanner is initialized.*
 ** Leaf Scanners
 *** StoreFileScanner: Returns singleton having the path of the associated 
{{{}HFile{}}}.
 *** SnapshotSegmentScanner / CollectionBackedScanner / SegmentScanner: Returns 
empty set.
 ** Composite Scanners
 *** StoreScanner & ReversedStoreScanner: Aggregates files from all active 
{{StoreFileScanners}}
 *** KeyValueHeap & ReversedKeyValueHeap: Aggregates files from its internal 
priority queue of scanners.
 ** Abstract Scanners
 *** NonLazyKeyValueScanner / NonReversedNonLazyKeyValueScanner: Returns empty 
set.{*}{{*}}
 * *Exposing via RegionScanner & TableSnapshotRecordReader*
 ** RegionScanner: Aggregates files from all underlying StoreScanners
 ** TableSnapshotRecordReader: Proxies the call through ClientSideRegionScanner 
to allow MapReduce jobs to access this for snapshot-based scans.
 ** Note: Also 


> Standardize KeyValueScanner interface and all implementations to 
> LimitedPrivate
> -------------------------------------------------------------------------------
>
>                 Key: HBASE-29864
>                 URL: https://issues.apache.org/jira/browse/HBASE-29864
>             Project: HBase
>          Issue Type: New Feature
>          Components: API, regionserver, Scanners
>            Reporter: Himanshu Gwalani
>            Assignee: Himanshu Gwalani
>            Priority: Major
>             Fix For: 2.7.0, 3.0.0-beta-2
>
>
> Currently, StoreFileScanner uses 
> @InterfaceAudience.LimitedPrivate(HBaseInterfaceAudience.PHOENIX) 
> while the KeyValueScanner interface and all other implementations use 
> @InterfaceAudience.Private. 
> This inconsistency should be addressed for API clarity and maintainability.
> **Current State:**
> - KeyValueScanner (interface): @InterfaceAudience.Private ⚠️
> - StoreFileScanner: 
> @InterfaceAudience.LimitedPrivate(HBaseInterfaceAudience.PHOENIX) ✓
> - All other implementations: @InterfaceAudience.Private ⚠️
> **Proposed Change:**
> Change the KeyValueScanner interface and ALL implementations to:
> @InterfaceAudience.LimitedPrivate(HBaseInterfaceAudience.PHOENIX)
> **Rationale:**
> 1. Phoenix requires direct access to scanner implementations 
> (StoreFileScanner already has this)
> 2. Phoenix code references implementations through the KeyValueScanner 
> interface type, so the interface must also be LimitedPrivate(PHOENIX) for 
> consistency
> 3. All implementations should have the same audience annotation as the 
> interface for maintainability
> 4. Coprocessors access scanners through RegionScanner (LimitedPrivate for 
> COPROC), not directly through KeyValueScanner



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to