Please keep all discussions to Mailing lists here - no offline discussions
please.

On Fri, Aug 2, 2019 at 10:22 AM vino yang <[email protected]> wrote:

> Hi guys,
>
> Currently, I, Taher and Vinay are working on issue HUDI-184.[1]
>
> As a first step, we are discussing the design doc.
>
> After diving into the code, We listed some relevant classes about the Spark
> delta writer.
>
>    - module: hoodie-utilities
>
> com.uber.hoodie.utilities.deltastreamer.HoodieDeltaStreamer
> com.uber.hoodie.utilities.deltastreamer.DeltaSyncService
> com.uber.hoodie.utilities.deltastreamer.SourceFormatAdapter
> com.uber.hoodie.utilities.schema.SchemaProvider
> com.uber.hoodie.utilities.transform.Transformer
>
>    - module: hoodie-client
>
> com.uber.hoodie.HoodieWriteClient (to commit compaction)
>
>
> The fact is *hoodie-utilities* depends on *hoodie-client*, however,
> *hoodie-client* is also not a pure Hudi component, it also depends on Spark
> lib.
>
> So I propose hoodie should provide a pure hoodie-client and decouple with
> Spark. Then Flink and Spark modules should depend on it.
>
> Moreover, based on the old discussion[2], we all agree that Spark is not
> the only choice for Hudi, it could also be Flink/Beam.
>
> IMO, We should decouple Hudi from Spark at the height of the project,
> including but not limited to module splitting and renaming.
>
> Not sure if this requires a HIP to drive.
>
> We should first listen to the opinions of the community. Any ideas and
> suggestions are welcome and appreciated.
>
> Best,
> Vino
>
> [1]: https://issues.apache.org/jira/browse/HUDI-184?filter=-1
> [2]:
>
> https://lists.apache.org/api/source.lua/1533de2d4cd4243fa9e8f8bf057ffd02f2ac0bec7c7539d8f72166ea@%3Cdev.hudi.apache.org%3E
>

Reply via email to