[GitHub] [hudi] teeyog commented on pull request #2475: [HUDI-1527] automatically infer the data directory, users only need to specify the table directory

GitBox Sat, 06 Feb 2021 17:16:56 -0800


teeyog commented on pull request #2475:
URL: https://github.com/apache/hudi/pull/2475#issuecomment-774571361



   > > You only need to add tabproperties to the hive table metadata: 
spark.sql.sources.provider= hudi, you can automatically convert the hive table 
to the hudi table.
   > 
   > @teeyog can you please expand on this. is this related to this PR or a 
general comment?
   
   If the hive metadata tabproperties contains 
```spark.sql.sources.provider=hudi```, the parsing process of sparksql reading 
the hive table is as follows：
   First step
   
[https://github.com/apache/spark/blob/62be2483d7d78e61fd2f77929cf41c76eff17869/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala#L302](https://github.com/apache/spark/blob/62be2483d7d78e61fd2f77929cf41c76eff17869/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala#L302)
   
   Second step
   
[https://github.com/apache/spark/blob/62be2483d7d78e61fd2f77929cf41c76eff17869/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala#L261](https://github.com/apache/spark/blob/62be2483d7d78e61fd2f77929cf41c76eff17869/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala#L261)
   
   The resolveRelation in the second step will go directly to the DefaultSource 
of hudi, so reading the hive table is automatically converted to reading the 
hudi table


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] teeyog commented on pull request #2475: [HUDI-1527] automatically infer the data directory, users only need to specify the table directory

Reply via email to