govorunov commented on issue #3756:
URL: https://github.com/apache/hudi/issues/3756#issuecomment-940832604


   Sorry, I'm quite new to big data so may ask some stupid questions. Let's 
forget about temporal storage, database backups etc. for a minute. Can we use 
Hudi to store all database events without significant write amplification and 
without making assumptions about nature of the data itself?
   
   I mean imagine raw stream of data change events from CDC or something else - 
transaction log - long append-only table. Can we have this with Hudi 
effectively? Because what I've seen now while experimenting Hudi would create a 
complete copy of entire partition (gigabytes to terabytes depending how we 
partitioned) every time few new rows are added or modified. It does not matter 
COW or MOR - former would create a copy instantly, while the latter would do 
this every few minutes on compaction step. And what I need is the ability to 
append records to the table indefinitely, without write amplification, 
partition data by creation date and shedule compaction only after all the data 
for current day has been ingested.  Hystorical querying is not needed here as 
the data is append-only. Can we shedule MOR compactions to run once a day 
instead every few minutes as it is now to reduce write amplification? Once we 
have tables storing complete transaction log we may think about derived tables.
   
   Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to