(skywalking-banyandb) 01/26: Enhance TODO for Secondary Index File System: Update Phase 6 with detailed implementation tasks for Query Path, including part and multi-part iterators, block scanner, and query result handling. Improve structure and clarity of tasks and test cases to align with stream architecture.

hanahmily Wed, 27 Aug 2025 18:09:28 -0700

This is an automated email from the ASF dual-hosted git repository.

hanahmily pushed a commit to branch sidx/query
in repository https://gitbox.apache.org/repos/asf/skywalking-banyandb.git


commit f525e92df0831953cc5ae27b687bdd47327ffd63
Author: Gao Hongtao <hanahm...@gmail.com>
AuthorDate: Sat Aug 23 22:06:21 2025 +0800

    Enhance TODO for Secondary Index File System: Update Phase 6 with detailed 
implementation tasks for Query Path, including part and multi-part iterators, 
block scanner, and query result handling. Improve structure and clarity of 
tasks and test cases to align with stream architecture.
---
 banyand/internal/sidx/TODO.md | 96 +++++++++++++++++++++++--------------------
 1 file changed, 51 insertions(+), 45 deletions(-)

diff --git a/banyand/internal/sidx/TODO.md b/banyand/internal/sidx/TODO.md
index 2e1fda97..1b3a69e7 100644
--- a/banyand/internal/sidx/TODO.md
+++ b/banyand/internal/sidx/TODO.md
@@ -23,61 +23,67 @@ This document tracks the implementation progress of the 
Secondary Index File Sys
 
 ---
 
-## Phase 6: Query Path
+## Phase 6: Query Path (Following Stream Architecture)
 
-### 6.1 Query Interface (`query.go`)
-- [ ] Key range queries with tag filters
-- [ ] Support projections and result limits
-- [ ] Query validation and optimization
+### 6.1 Part Iterator (`part_iter.go` or extend `part.go`)
+- [ ] **Implementation Tasks**:
+  - [ ] Create `partIter` struct for single part iteration
+  - [ ] Implement `init(part, seriesIDs, minKey, maxKey, filter)`
+  - [ ] Add `nextBlock() bool` method for block advancement
+  - [ ] Create `curBlock` access and `error()` handling
 - [ ] **Test Cases**:
-  - [ ] Query parsing handles all parameter types
-  - [ ] Validation rejects invalid queries
-  - [ ] Query optimization improves performance
-  - [ ] Complex queries return correct results
-
-### 6.2 Part Filtering (`query.go`)
-- [ ] Filter parts by key range overlap
-- [ ] Minimize I/O operations through smart filtering
-- [ ] Support inclusive/exclusive bounds
+  - [ ] Part iteration finds all matching blocks in correct order
+  - [ ] Key range filtering works at block level
+  - [ ] Error handling for corrupted parts
+  - [ ] Performance meets single-part iteration targets
+
+### 6.2 Multi-Part Iterator (`iter.go` - like stream's `tstIter`)
+- [ ] **Implementation Tasks**:
+  - [ ] Create `iter` struct with `partIterHeap` for ordering
+  - [ ] Implement `init(parts, seriesIDs, minKey, maxKey, filter)`
+  - [ ] Add `nextBlock() bool` with heap-based merge logic
+  - [ ] Create `Error()` method for aggregated error handling
 - [ ] **Test Cases**:
-  - [ ] Filtering accuracy eliminates non-overlapping parts
-  - [ ] Performance improvement through reduced I/O
-  - [ ] Boundary conditions handled correctly
-  - [ ] Empty result sets handled gracefully
+  - [ ] Multi-part ordering maintained across parts
+  - [ ] Heap operations preserve key ordering (ASC/DESC)
+  - [ ] Iterator handles empty parts gracefully
+  - [ ] Memory usage during multi-part iteration is controlled
 
-### 6.3 Block Scanner (`block_scanner.go`)
+### 6.3 Block Scanner (`block_scanner.go` - like stream's block_scanner)
 - [ ] **Implementation Tasks**:
-  - [ ] Create block_scanner.go with scanner structure
-  - [ ] Implement scanBlock() with filtering logic
-  - [ ] Add scanBlockElements() for element processing
-  - [ ] Create matchesTagFilters() for tag filtering
+  - [ ] Create `blockScanner` struct using `iter` for block access
+  - [ ] Implement `scan(ctx, blockCh chan *blockScanResultBatch)`
+  - [ ] Add batch processing with `blockScanResultBatch` pattern
+  - [ ] Create element-level filtering and tag matching
 - [ ] **Test Cases**:
-  - [ ] Scan completeness finds all matching data
-  - [ ] Filter effectiveness reduces false positives
-  - [ ] Block scanning performance meets targets
-  - [ ] Memory usage during scanning is controlled
-
-### 6.4 Result Iterator (`query.go`)
-- [ ] Stream results with proper ordering
-- [ ] Memory-efficient iteration patterns
-- [ ] Support both ASC and DESC ordering
+  - [ ] Batch processing maintains correct element ordering
+  - [ ] Memory quota management prevents OOM
+  - [ ] Tag filtering accuracy with bloom filters
+  - [ ] Worker coordination and error propagation
+
+### 6.4 Query Result (`query_result.go` - like stream's `tsResult`)
+- [ ] **Implementation Tasks**:
+  - [ ] Create `queryResult` struct implementing `QueryResult` interface
+  - [ ] Implement `Pull() *QueryResponse` with worker pool pattern
+  - [ ] Add `runBlockScanner(ctx)` for parallel processing
+  - [ ] Create `Release()` for resource cleanup
 - [ ] **Test Cases**:
-  - [ ] Iterator correctness for various query types
-  - [ ] Memory usage remains bounded
-  - [ ] Ordering is maintained across parts
-  - [ ] Iterator cleanup prevents resource leaks
+  - [ ] Pull/Release pattern prevents resource leaks
+  - [ ] Parallel workers maintain result ordering
+  - [ ] Result merging produces correct final output
+  - [ ] Performance scales with worker count
 
-### 6.5 Block Reader (`block_reader.go`)
+### 6.5 SIDX Query Interface (extend `sidx.go`)
 - [ ] **Implementation Tasks**:
-  - [ ] Create block_reader.go with reader structure
-  - [ ] Implement readUserKeys(), readData(), readTags()
-  - [ ] Add decompression support
-  - [ ] Create mustReadFrom() for block loading
+  - [ ] Implement `Query(ctx context.Context, req QueryRequest) (QueryResult, 
error)`
+  - [ ] Add query validation and snapshot acquisition
+  - [ ] Create part selection by key range overlap
+  - [ ] Integrate with existing sidx snapshot management
 - [ ] **Test Cases**:
-  - [ ] Block reading maintains data integrity
-  - [ ] Decompression works correctly
-  - [ ] Block structure reconstruction is accurate
-  - [ ] Read performance meets requirements
+  - [ ] Query validation catches invalid requests
+  - [ ] Part selection optimizes I/O by filtering non-overlapping parts
+  - [ ] Integration with snapshot reference counting works correctly
+  - [ ] End-to-end query performance meets targets
 
 ---

Reply via email to