rluvaton commented on issue #1028:
URL: 
https://github.com/apache/datafusion-comet/issues/1028#issuecomment-2912704996

   Looking at the discussion of Iceberg and delta lake support it seems like 
there should be a different solution than the extension authors (iceberg/delta 
lake) should implement comet Support trait.
   
   Why not create a maven package that have Transitive dependency to 
iceberg/deltalake/other popular third party jars and will implement support 
comet and also check if conversion is supported, and convert to protobuf.
   
   
   If we had some kind of internal package that will just expose basic things, 
like `isSupported`, `convert` that will be used in the scala serde file, we 
could implement
   
   ```
   ./:
     iceberg-support/:
       pom.xml - which will have Transitive dependency to iceberg maven package
       ... - the code that will mark the reader as supported and will add 
convert code for QueryPlanSerde
   
     delta-lake-support/:
       pom.xml - which will have Transitive dependency to delta maven package
       ... - the code that will mark the reader as supported and will add 
convert code for QueryPlanSerde
   
     avro-support/:
       pom.xml - which will have Transitive dependency to spark avro maven 
package
       ... - the code that will mark the avro file format as supported and will 
add conversion code for QueryPlanSerde 
   
   ```
   
   # Examples
   
   ## Delta Lake
   I we want to replace delta lake java reader with the delta-lake rs reader 
and avoid serializing to java
   Delta lake has implementation in rust, in order to match delta lake file 
format and all the config so we can replace it with native scan we need to have 
the delta lake jar. but to avoid having delta lake dependency on comet as well 
as iceberg, we can create our own delta-late support package that will have 
transitive dependency to delta lake spark extension and in this package it will 
add the support trait to the delta lake class + whether it is supported in the 
current schema and conversion to protobuf.
   
   ## Avro Reader 
   
   I want to read Avro file. unlike parquet, Avro package no longer come built 
in with spark, this means that we will not be able to match `AvroFileFormat` 
here:
   
https://github.com/apache/datafusion-comet/blob/6663245d73c79b4547605e1f68840bc0c2a4d22d/spark/src/main/scala/org/apache/spark/sql/comet/CometScanExec.scala#L527-L529
   
   so if we had like delta lake - a separate package that will have transitive 
dependency to avro package - we could match on that and add support for reading 
Avro.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to