[GitHub] [hudi] nsivabalan commented on issue #3324: [SUPPORT]Slow Performance With Spark Structured Streaming

GitBox Fri, 06 Aug 2021 13:08:30 -0700


nsivabalan commented on issue #3324:
URL: https://github.com/apache/hudi/issues/3324#issuecomment-894492641



   with MOR, there are 3 types of queries that could be of benefit to you. 
   Config : https://hudi.apache.org/docs/configurations#query_type_opt_key
   [Snapshot/Realtime 
read](https://hudi.apache.org/docs/quick-start-guide#query-data) : reads entire 
data for latest snapshot. 
   
   ReadOptimized query: "read_optimized"
   As I was telling you earlier, for a given data file, depending on your 
compaction schedule, there could be some delta log files. For snapshot reads, 
these will be merged with base data files and then served. Where as for 
ReadOptimized query, only the base data files will be read. 
   
   If you can give up on freshness, your queries will be much faster since 
there is not real time merge involved. 
   
   And then you have [incremental 
read](https://hudi.apache.org/docs/quick-start-guide#incremental-query) which 
will give you delta records between commits.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] nsivabalan commented on issue #3324: [SUPPORT]Slow Performance With Spark Structured Streaming

Reply via email to