Shekharrajak opened a new issue, #3756:
URL: https://github.com/apache/datafusion-comet/issues/3756

   We should use below matrix to check for any missing implementations that 
could accelerate Spark Iceberg pipeline using comet
   
   ### READ PATH
   
   | Feature | Iceberg Java | iceberg-rust | datafusion-comet (via 
iceberg-rust) |
   |---|---|---|---|
   | Basic Parquet scan | Yes | Yes | Yes - IcebergScanExec |
   | Positional deletes (V2 MoR) | Yes | Yes - ArrowReader + DeleteVector 
(RoaringTreemap) | Yes - delegates to ArrowReader with 
row_selection_enabled(true) |
   | Equality deletes (V2 MoR) | Yes | Yes - ArrowReader builds equality delete 
predicates | Yes - delegates to ArrowReader |
   | Deletion vectors (V3) | Yes - DVUtil, DVFileWriter, DVIterator | Yes - 
DeleteVector + Puffin deletion-vector-v1 blob support | Not wired - Comet 
doesn't pass DV metadata via protobuf |
   | Schema evolution | Yes | Yes | Yes - IcebergStreamWrapper adapts batches 
to target schema |
   | Partition pruning (static) | Yes | Yes | Yes - partitions serialized in 
protobuf |
   | Dynamic partition pruning | Yes (Spark) | N/A (engine-level) | Yes - 
CometIcebergNativeScanExec defers serialization for DPP |
   | Row-group filtering (residuals) | Yes | Yes | Yes - residual predicates 
converted to iceberg::expr::BoundPredicate |
   | Identity partition columns | Yes | Yes | Yes |
   | Object stores (S3/GCS/OSS) | Yes (Hadoop FS) | Yes (OpenDAL) | Yes 
(OpenDAL via FileIOBuilder) |
   | V1 spec | Yes | Yes | Yes |
   | V2 spec | Yes | Yes | Yes |
   | V3 spec metadata | Yes | Yes (FormatVersion::V3, next_row_id, row lineage) 
| Not used - Comet doesn't handle V3-specific metadata |
   
   ### WRITE PATH
   
   | Feature | Iceberg Java | iceberg-rust | datafusion-comet |
   |---|---|---|---|
   | Data file writing | Yes - DataWriter | Yes - DataFileWriter | No - uses 
raw parquet crate, not iceberg-rust |
   | Partitioned writes (sorted) | Yes - ClusteredDataWriter | Yes - 
ClusteredWriter | No - writes single file per Spark partition |
   | Partitioned writes (fanout) | Yes - FanoutDataWriter | Yes - FanoutWriter 
| No |
   | Rolling file writer | Yes | Yes - RollingFileWriter | No |
   | Equality delete writer | Yes | Yes - EqualityDeleteWriter | No |
   | Position delete writer | Yes | Partial | No |
   | Deletion vector writer | Yes - DVFileWriter, PartitioningDVWriter | No 
explicit DV writer | No |
   | AppendFiles / FastAppend | Yes - AppendFiles | Yes - FastAppendAction | No 
- commit done in Java |
   | OverwriteFiles | Yes - OverwriteFiles | Missing | No |
   | ReplacePartitions | Yes - ReplacePartitions | Missing | No |
   | DeleteFiles | Yes - DeleteFiles | Missing | No |
   | RowDelta | Yes - RowDelta | Missing | No |
   | RewriteFiles | Yes - RewriteFiles | Missing | No |
   | Transaction + commit | Yes - full atomic commit | Yes - 
Transaction::commit() with retry | No - commit is JVM-side |
   
   ### ROW-LEVEL OPERATIONS (DELETE/UPDATE/MERGE)
   
   | Feature | Iceberg Java + Spark | iceberg-rust | datafusion-comet |
   |---|---|---|---|
   | Copy-on-Write (CoW) scan | Yes - SparkCopyOnWriteScan | No CoW scan | No |
   | Copy-on-Write write | Yes - rewrite affected data files | Partial (rewrite 
manually) | No |
   | Merge-on-Read (MoR) scan | Yes - buildMergeOnReadScan() | Yes - 
ArrowReader applies deletes | Yes (read only) |
   | MoR position delta write | Yes - SparkPositionDeltaWrite | No | No |
   | DELETE FROM | Yes (CoW or MoR) | No action | No |
   | UPDATE | Yes (CoW or MoR) | No action | No |
   | MERGE INTO | Yes - SparkRowLevelOperationBuilder | No (issue #2201) | No


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to