Re: [DISCUSS] Decouple Hudi and Spark

Suneel Marthi Sat, 03 Aug 2019 19:45:08 -0700

+1 for Beam -- agree with Semantic Beeng's analysis.

On Sat, Aug 3, 2019 at 10:30 PM taher koitawala <[email protected]> wrote:


> So the way to go around this is that file a hip. Chalk all th classes our
> and start moving towards Pure client.
>
> Secondly should we want to try beam?
>
> I think there is to much going on here and I'm not able to follow. If we
> want to try out beam all along I don't think it makes sense to do anything
> on Flink then.
>
> On Sun, Aug 4, 2019, 2:30 AM Semantic Beeng <[email protected]>
> wrote:
>
>> +1 My money is on this approach.
>>
>> The existing abstractions from Beam seem enough for the use cases as I
>> imagine them.
>>
>> Flink also has "dynamic table", "table source" and "table sink" which
>> seem very useful abstractions where Hudi might fit nicely.
>>
>>
>> https://ci.apache.org/projects/flink/flink-docs-stable/dev/table/streaming/dynamic_tables.html
>>
>>
>> Attached a screen shot.
>>
>> This seems to fit with the original premise of Hudi as well.
>>
>> Am exploring this venue with a use case that involves "temporal joins on
>> streams" which I need for feature extraction.
>>
>> Anyone is interested in this or has concrete enough needs and use cases
>> please let me know.
>>
>> Best to go from an agreed upon set of 2-3 use cases.
>>
>> Cheers
>>
>> Nick
>>
>>
>> > Also, we do have some Beam experts on the mailing list.. Can you please
>> weigh on viability of using Beam as the intermediate abstraction here
>> between Spark/Flink?
>> Hudi uses RDD apis like groupBy, mapToPair, sortAndRepartition,
>> reduceByKey, countByKey and also does custom partitioning a lot.>
>>
>> >
>>
>

Re: [DISCUSS] Decouple Hudi and Spark

Reply via email to