luoyuxia opened a new issue, #128:
URL: https://github.com/apache/paimon-rust/issues/128

   ## Parent Issue
   Part of #124 (support partitioned table)
   Depends on #126 (BinaryRow deserialization), #127 (partition path generation)
   
   ## Background
   
   `TableScan::plan_snapshot()` currently discards partition information when 
building `DataSplit`s:
   
   ```rust
   // table_scan.rs:154
   for ((_partition, bucket), group_entries) in groups {
       // ...
       // table_scan.rs:171-173
       // todo: consider partitioned table
       let bucket_path = format!("{base_path}/bucket-{bucket}");
       let partition = BinaryRow::new(0);  // Always empty!
   }
   ```
   
   For partitioned tables, the correct path should be 
`{table_path}/{partition_path}/bucket-{bucket}`, e.g., 
`{table_path}/dt=2024-01-01/bucket-0/`.
   
   ## What needs to be done
   
   1. **Pass partition type info to `plan_snapshot()`**
      - Add partition keys (names) and partition field types (from 
`TableSchema`) as parameters, or pass the `TableSchema` itself
      - Alternatively, change `plan_snapshot()` from a static method to an 
instance method that can access `self.table.schema`
   
   2. **Decode partition bytes into `BinaryRow`**
      - For each group key `(partition_bytes, bucket)`, construct a `BinaryRow` 
from the raw bytes using `BinaryRow::from_bytes(arity, data)`
      - The arity is the number of partition keys
   
   3. **Generate partition path using `PartitionPathUtils`**
      - Call the partition path utility (from #127) to compute the partition 
path segment
      - Construct `bucket_path` as 
`{table_path}/{partition_path}/bucket-{bucket}`
   
   4. **Store actual partition data in `DataSplit`**
      - Pass the decoded `BinaryRow` (with real data) to 
`DataSplitBuilder.with_partition()` instead of the empty `BinaryRow::new(0)`
   
   ## Affected files
   - `crates/paimon/src/table/table_scan.rs` — `plan_snapshot()` method


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to