Re: [PR] [GLUTEN-5471][VL]feat: Support read Hudi COW table [incubator-gluten]

via GitHub Tue, 11 Jun 2024 18:39:09 -0700


yma11 commented on code in PR #6049:
URL: https://github.com/apache/incubator-gluten/pull/6049#discussion_r1635671447



##########
gluten-core/src/main/scala/org/apache/gluten/execution/DataSourceScanTransformerRegister.scala:
##########
@@ -30,13 +30,13 @@ trait DataSourceScanTransformerRegister {
   /**
    * The class name that used to identify what kind of datasource this is。
    *
-   * For DataSource V1, it should be the child class name of
-   * [[org.apache.spark.sql.execution.datasources.FileIndex]].
+   * For DataSource V1, it should be relation.fileFormat like

Review Comment:
   @YannByron For `org.apache.spark.sql.execution.datasources.FileIndex`, it 
can be used to distinguish different datasources but it's too general that all 
kinds of files read will pass, such as meta data/log files used for query plan 
analysis. It is not necessary and may trigger failures in some corner cases. So 
here, we limit it to the parquet format, are you okay for this change?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [GLUTEN-5471][VL]feat: Support read Hudi COW table [incubator-gluten]

Reply via email to