Please keep all discussions to Mailing lists here - no offline discussions please.
On Fri, Aug 2, 2019 at 10:22 AM vino yang <[email protected]> wrote: > Hi guys, > > Currently, I, Taher and Vinay are working on issue HUDI-184.[1] > > As a first step, we are discussing the design doc. > > After diving into the code, We listed some relevant classes about the Spark > delta writer. > > - module: hoodie-utilities > > com.uber.hoodie.utilities.deltastreamer.HoodieDeltaStreamer > com.uber.hoodie.utilities.deltastreamer.DeltaSyncService > com.uber.hoodie.utilities.deltastreamer.SourceFormatAdapter > com.uber.hoodie.utilities.schema.SchemaProvider > com.uber.hoodie.utilities.transform.Transformer > > - module: hoodie-client > > com.uber.hoodie.HoodieWriteClient (to commit compaction) > > > The fact is *hoodie-utilities* depends on *hoodie-client*, however, > *hoodie-client* is also not a pure Hudi component, it also depends on Spark > lib. > > So I propose hoodie should provide a pure hoodie-client and decouple with > Spark. Then Flink and Spark modules should depend on it. > > Moreover, based on the old discussion[2], we all agree that Spark is not > the only choice for Hudi, it could also be Flink/Beam. > > IMO, We should decouple Hudi from Spark at the height of the project, > including but not limited to module splitting and renaming. > > Not sure if this requires a HIP to drive. > > We should first listen to the opinions of the community. Any ideas and > suggestions are welcome and appreciated. > > Best, > Vino > > [1]: https://issues.apache.org/jira/browse/HUDI-184?filter=-1 > [2]: > > https://lists.apache.org/api/source.lua/1533de2d4cd4243fa9e8f8bf057ffd02f2ac0bec7c7539d8f72166ea@%3Cdev.hudi.apache.org%3E >
