[GitHub] [hudi] WTa-hash commented on issue #2255: [SUPPORT] Global Bloom and partition update not working correctly in MOR table

GitBox Sat, 21 Nov 2020 20:52:04 -0800


WTa-hash commented on issue #2255:
URL: https://github.com/apache/hudi/issues/2255#issuecomment-731698177



   > got you. But can confirm that w/ athena have you tried skipping this 
option DataSourceReadOptions.QUERY_TYPE_OPT_KEY while reading?
   > btw, a clarification. A single dataset may not be able to serve both 
purpose right. I mean, either you can set 
hoodie.compact.inline.max.delta.commits=1 or you can set some higher value. 
these are per dataset configs. So, not sure how can you achieve both usecases 
with a single MOR dataset.
   > Thats is why we have snapshot and read optimized query to cater to this 
scenario.
   
   Using Athena to query this MOR table, I get the exact same resultset as the 
MOR screenshots from the original post. Athena can only read from read 
optimized table, so I didn't include DataSourceReadOptions.QUERY_TYPE_OPT_KEY.
   
   The way we have our Spark application set up is that teams are free to 
decide how they want to use hoodie.compact.inline.max.delta.commits. Teams have 
their own databases/streams, so one team may run the application on stream X 
with hoodie.compact.inline.max.delta.commits=1 if data latency is not important 
and another team may run a separate application instance on stream Y with 
hoodie.compact.inline.max.delta.commits=20 if data latency is important.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] WTa-hash commented on issue #2255: [SUPPORT] Global Bloom and partition update not working correctly in MOR table

Reply via email to