Re: [DISCUSS] Remove HoodieWriteClient

Vinoth Chandar Mon, 03 Feb 2020 11:17:33 -0800

yes. we will continue to invest in both is what I am trying to get across..
Agree 100% that spark datasource is a primary entry point for users


On Mon, Jan 27, 2020 at 1:59 AM hmatu <[email protected]> wrote:

> Thanks.
>
>
>
> IMO, we should focus more on SparkDatasource level, not the compatibility
> with the HoodieClient level.
>
>
> Thanks,
> Hmatu
>
>
>
>
>
> ------------------&nbsp;Original&nbsp;------------------
> From:&nbsp;"Vinoth Chandar"<[email protected]&gt;;
> Date:&nbsp;Mon, Jan 27, 2020 02:45 AM
> To:&nbsp;"dev"<[email protected]&gt;;
>
> Subject:&nbsp;Re: [DISCUSS] Remove HoodieWriteClient
>
>
>
> The datasource and deltastreamer are all built on top of the
> HoodieWriteClient.. So, we cannot remove it. Plus, the RDD level API is
> actually more efficient for ingesting data from say Kafka. We can go from
> avro to parquet or avro to avro directly (as opposed to avro -&gt; row
> -&gt;
> parquet, or avro -&gt; row -&gt; avro). This is one of the reasons for
> Hudi's
> design even.. RFC-13 will change a bunch of things here..
>
> But we do need the RDD api IMO
>
> On Sun, Jan 26, 2020 at 8:13 AM hmatu <[email protected]&gt; wrote:
>
> &gt; Hi guys,
> &gt;
> &gt;
> &gt; As we know, hudi project contains HoodieWriteClient and
> HoodieSparkSource
> &gt; level framework. But may 99% user just use HoodieSparkSource except
> for
> &gt; uber. So I suggest remove HoodieWriteClient. WDYT?
> &gt;
> &gt;
> &gt; Thanks
> &gt; Hmatu

Re: [DISCUSS] Remove HoodieWriteClient

Reply via email to