hey Shayan,

that seems actually a very good approach ... just curious with the glue
metastore you mentioned. Would it be an external metastore for spark to
query over ??? external in terms of not managed by Hudi ???

that would be my only concern ... how to maintain the sync between all
metadata partitions but , again, a very promising approach !



Em qua., 8 de jul. de 2020 às 15:20, Shayan Hati <shayanh...@gmail.com>

> Hi folks,
> We have a use-case where we want to ingest data concurrently for different
> partitions. Currently Hudi doesn't support concurrent writes on the same
> Hudi table.
> One of the approaches we were thinking was to use one hudi table per
> partition of data. So let us say we have 1000 partitions, we will have 1000
> Hudi tables which will enable us to write concurrently on each partition.
> And the metadata for each partition will be synced to a single metastore
> table (Assumption here is schema is same for all partitions). So this
> single metastore table can be used for all the spark, hive queries when
> querying data. Basically this metastore glues all the different hudi table
> data together in a single table.
> We already tested this approach and its working fine and each partition
> will have its own timeline and hudi table.
> We wanted to know if there are some gotchas or any other issues with this
> approach to enable concurrent writes? Or if there are any other approaches
> we can take?
> Thanks,
> Shayan

Reply via email to