yes. we will continue to invest in both is what I am trying to get across.. Agree 100% that spark datasource is a primary entry point for users
On Mon, Jan 27, 2020 at 1:59 AM hmatu <[email protected]> wrote: > Thanks. > > > > IMO, we should focus more on SparkDatasource level, not the compatibility > with the HoodieClient level. > > > Thanks, > Hmatu > > > > > > ------------------ Original ------------------ > From: "Vinoth Chandar"<[email protected]>; > Date: Mon, Jan 27, 2020 02:45 AM > To: "dev"<[email protected]>; > > Subject: Re: [DISCUSS] Remove HoodieWriteClient > > > > The datasource and deltastreamer are all built on top of the > HoodieWriteClient.. So, we cannot remove it. Plus, the RDD level API is > actually more efficient for ingesting data from say Kafka. We can go from > avro to parquet or avro to avro directly (as opposed to avro -> row > -> > parquet, or avro -> row -> avro). This is one of the reasons for > Hudi's > design even.. RFC-13 will change a bunch of things here.. > > But we do need the RDD api IMO > > On Sun, Jan 26, 2020 at 8:13 AM hmatu <[email protected]> wrote: > > > Hi guys, > > > > > > As we know, hudi project contains HoodieWriteClient and > HoodieSparkSource > > level framework. But may 99% user just use HoodieSparkSource except > for > > uber. So I suggest remove HoodieWriteClient. WDYT? > > > > > > Thanks > > Hmatu
