rluvaton commented on issue #1028: URL: https://github.com/apache/datafusion-comet/issues/1028#issuecomment-2912704996
Looking at the discussion of Iceberg and delta lake support it seems like there should be a different solution than the extension authors (iceberg/delta lake) should implement comet Support trait. Why not create a maven package that have Transitive dependency to iceberg/deltalake/other popular third party jars and will implement support comet and also check if conversion is supported, and convert to protobuf. If we had some kind of internal package that will just expose basic things, like `isSupported`, `convert` that will be used in the scala serde file, we could implement ``` ./: iceberg-support/: pom.xml - which will have Transitive dependency to iceberg maven package ... - the code that will mark the reader as supported and will add convert code for QueryPlanSerde delta-lake-support/: pom.xml - which will have Transitive dependency to delta maven package ... - the code that will mark the reader as supported and will add convert code for QueryPlanSerde avro-support/: pom.xml - which will have Transitive dependency to spark avro maven package ... - the code that will mark the avro file format as supported and will add conversion code for QueryPlanSerde ``` # Examples ## Delta Lake I we want to replace delta lake java reader with the delta-lake rs reader and avoid serializing to java Delta lake has implementation in rust, in order to match delta lake file format and all the config so we can replace it with native scan we need to have the delta lake jar. but to avoid having delta lake dependency on comet as well as iceberg, we can create our own delta-late support package that will have transitive dependency to delta lake spark extension and in this package it will add the support trait to the delta lake class + whether it is supported in the current schema and conversion to protobuf. ## Avro Reader I want to read Avro file. unlike parquet, Avro package no longer come built in with spark, this means that we will not be able to match `AvroFileFormat` here: https://github.com/apache/datafusion-comet/blob/6663245d73c79b4547605e1f68840bc0c2a4d22d/spark/src/main/scala/org/apache/spark/sql/comet/CometScanExec.scala#L527-L529 so if we had like delta lake - a separate package that will have transitive dependency to avro package - we could match on that and add support for reading Avro. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org