WTa-hash commented on issue #2255: URL: https://github.com/apache/hudi/issues/2255#issuecomment-731698177
> got you. But can confirm that w/ athena have you tried skipping this option DataSourceReadOptions.QUERY_TYPE_OPT_KEY while reading? > btw, a clarification. A single dataset may not be able to serve both purpose right. I mean, either you can set hoodie.compact.inline.max.delta.commits=1 or you can set some higher value. these are per dataset configs. So, not sure how can you achieve both usecases with a single MOR dataset. > Thats is why we have snapshot and read optimized query to cater to this scenario. Using Athena to query this MOR table, I get the exact same resultset as the MOR screenshots from the original post. Athena can only read from read optimized table, so I didn't include DataSourceReadOptions.QUERY_TYPE_OPT_KEY. The way we have our Spark application set up is that teams are free to decide how they want to use hoodie.compact.inline.max.delta.commits. Teams have their own databases/streams, so one team may run the application on stream X with hoodie.compact.inline.max.delta.commits=1 if data latency is not important and another team may run a separate application instance on stream Y with hoodie.compact.inline.max.delta.commits=20 if data latency is important. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
