teeyog commented on pull request #2475: URL: https://github.com/apache/hudi/pull/2475#issuecomment-774571361
> > You only need to add tabproperties to the hive table metadata: spark.sql.sources.provider= hudi, you can automatically convert the hive table to the hudi table. > > @teeyog can you please expand on this. is this related to this PR or a general comment? If the hive metadata tabproperties contains ```spark.sql.sources.provider=hudi```, the parsing process of sparksql reading the hive table is as follows: First step [https://github.com/apache/spark/blob/62be2483d7d78e61fd2f77929cf41c76eff17869/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala#L302](https://github.com/apache/spark/blob/62be2483d7d78e61fd2f77929cf41c76eff17869/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala#L302) Second step [https://github.com/apache/spark/blob/62be2483d7d78e61fd2f77929cf41c76eff17869/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala#L261](https://github.com/apache/spark/blob/62be2483d7d78e61fd2f77929cf41c76eff17869/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala#L261) The resolveRelation in the second step will go directly to the DefaultSource of hudi, so reading the hive table is automatically converted to reading the hudi table ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
