Reposting a discussion on slack as FYI. "Jaimin [3:10 AM] Hi We have a use-case where we have set of MOR base tables and flattened entities based on them. For example we have order,customer,seller table and flattened entity based on joining these 3 tables. To create a flattened I think we need to fetch changes from each of these tables incrementally (incremental pull) and do join with rest of the complete tables. So there will n number of joins ( equal to number of tables involved in flattened entity). Is there any other efficient way to do this? Also for the join will we need our own spark job or hudi provides these capabilities also? Also our data can have deletes also I am using empty payload implementation to delete data. I tried out this we sample data deleting data from base table compacting and then using incremental pull to fetch changes but I didn't see deletes as part of incremental pull. Am I missing something? Thanks "
and my response " you can pull 3 tables and join them in a custom Spark job, that should be fine. (yes you need your own spark job.. DeltaStreamer tool supports transforms.. but limits itself to 1 table pulled incrementally).. What Nishith is alluding to is to be able to "safely" aligning windows between the 3 tables, which needs more business context as to determine.. For e.g, if you are joining the 3 tables based on order_id, then you need to be sure that the order shows up on customer/seller/order tables in the same time range you are pulling for.. @Jaimin This is such an interesting topic.. I will start a thread on the mailing list. Please join and we can continue there, so others can also jump in.. https://hudi.apache.org/community.html "
