[GitHub] [hudi] alexeykudinkin commented on issue #7322: [SUPPORT][HELP] SparkSQL can not read the latest change data without execute "refresh table xxx"

GitBox Tue, 13 Dec 2022 19:23:34 -0800


alexeykudinkin commented on issue #7322:
URL: https://github.com/apache/hudi/issues/7322#issuecomment-1350336358


   > @alexeykudinkin I think the query engine should not limit the writing way 
for querying data. Even for the tables created by Spakrsql, the query engine 
should be able to query new data regardless of the way in which the data is 
written by spark datasource, spark sql, java client, flink sql, and flink 
stream api, without requiring users to do additional operations for different 
writing methods when using the query engine.
   
   This is not a limitation of the query engine this is a limitation of how 
you're using the query engine -- when writing to a table specified as a path 
following issues are at play
   
   1. Spark SQL will cache the Relation w/in the session cache when queried
   2. When writing to a table identified by a full path, rather than a name and 
Spark has no way to invalidate the SQL session cache (since it doesn't have the 
table identifier)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] alexeykudinkin commented on issue #7322: [SUPPORT][HELP] SparkSQL can not read the latest change data without execute "refresh table xxx"

Reply via email to