[GitHub] [hudi] stackfun commented on issue #1860: [SUPPORT] Issue when querying from Spark Datasource if COW table is being written to at the same time

GitBox Thu, 23 Jul 2020 11:58:17 -0700


stackfun commented on issue #1860:
URL: https://github.com/apache/hudi/issues/1860#issuecomment-663176816



   I used the setting you recommended, and still get similar results. In this 
run, I was inserting 200 records in the writer job. 
   ```
   Hive Query: 600
   Spark Query: 777
   Hive Query: 800
   Spark Query: 800
   Hive Query: 800
   Spark Query: 800
   Hive Query: 800
   Spark Query: 800
   Hive Query: 800
   Spark Query: 851
   Hive Query: 1000
   Spark Query: 1000
   ```
   
   I'm refreshing the table before each query, so the table metadata in Spark 
should be cleared. Does this seem like a bug to you, or is there some other 
setting that I should try?
   
   I was stress testing Hudi's atomic write feature as our team is determining 
whether we can use Hudi for an efficient data lake. Directly querying the hive 
table using Spark SQL seems to work flawlessly, so we're not blocked. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] stackfun commented on issue #1860: [SUPPORT] Issue when querying from Spark Datasource if COW table is being written to at the same time

Reply via email to