Hi,songj , DeltaStreamer can be understood as a packaged Spark DataSource. You only need to set the required parameters, which makes it more convenient for data ingest.
Best, Trevor [email protected] From: songj songj Date: 2020-12-01 16:48 To: dev Subject: Re: why not use spark datasource in DeltaStreamer spark structured streaming consume kafka using kafka data source, and foreachbatch to do insert/upsert/... to hudi, is it similar with DeltaStreamer? songj songj <[email protected]> 于2020年12月1日周二 下午4:28写道: > hi, I have some questions: > > 1. DeltaStreamer has its own Source<JavaRDD<String>> to consume source > data, > such as Kafka, why not use spark datasource directly ? > > 2. Hudi has lots of logical which use RDD, why not use Spark DataFrame? > > I just want to know the background of the above implementation, thanks! >
