Hi

Thanks for the quick response.
As we discussed we will pull changes incrementally and join with MOR read
optimized view. For example order will be pulled incrementally and will be
joined with read optimized view of  seller and customer. Incrementally pull
seller and join with order and customer apply same process for customer
also.

Regarding "safely" aligning windows I don't think we should bother with
this as data will be corrected while processing subsequent batch. For
example data inserted in seller is not reflected and order comes first for
the seller data will be missed in the first batch but in the next batch
insert in seller will be processed which will be joined with customer and
orders so it will be handled. We are fine with eventual consistency of the
data. Please correct me if I am missing some points.

On Mon, 6 May 2019 at 23:55, Vinoth Chandar <[email protected]> wrote:

> Reposting a discussion on slack as FYI.
>
> "Jaimin [3:10 AM]
> Hi
> We have a use-case where we have set of MOR base tables and flattened
> entities based on them. For example we have order,customer,seller table and
> flattened entity  based on joining these 3 tables.
> To create a flattened I think we need to fetch changes from each of these
> tables incrementally (incremental pull) and do join with rest of the
> complete tables. So there will n number of joins ( equal to number of
> tables involved in flattened entity). Is there any other efficient way to
> do this? Also for the join will we need our own spark job or hudi provides
> these capabilities also?
> Also our data can have deletes also I am using empty payload implementation
> to delete data. I tried out this we sample data deleting data from base
> table compacting and then using incremental pull to fetch changes but I
> didn't see deletes as part of incremental pull. Am I missing something?
> Thanks "
>
> and my response
>
> "
> you can pull 3 tables and join them in a custom Spark job, that should be
> fine. (yes you need your own spark job.. DeltaStreamer tool supports
> transforms.. but limits itself to 1 table pulled incrementally)..  What
> Nishith is alluding to is to be able to "safely" aligning windows between
> the 3 tables, which needs more business context as to determine.. For e.g,
> if you are joining the 3 tables based on order_id, then you need to be sure
> that the order shows up on customer/seller/order tables in the same time
> range you are pulling for..
>
> @Jaimin This is such an interesting topic.. I will start a thread on the
> mailing list. Please join and we can continue there, so others can also
> jump in.. https://hudi.apache.org/community.html
> "
>

Reply via email to