I looked at the DataStreamWriter in Spark ( https://spark.apache.org/docs/latest/api/java/index.html?org/apache/spark/sql/streaming/DataStreamWriter.html) and the implementation seems to be different from DataSource. I haven't looked into what other classes need to be extended to support hudi format type for the DataStreamWriter (just like we have done for DataSource)
Does the datasource writer work for you ? Thanks, Nishith On Mon, Oct 28, 2019 at 9:34 PM Qian Wang <qwang1...@gmail.com> wrote: > Hi Nishith, > > Thanks for reply. > > I did use the Datasource Writer to write instead of using > DataStreamWriter. I think Datasource Writer also can support write > streaming data, correct? > > Best, > Qian > On Oct 28, 2019, 9:31 PM -0700, nishith agarwal <n3.nas...@gmail.com>, > wrote: > > Qian, > > > > It seems like you are using the > > > https://spark.apache.org/docs/latest/api/java/index.html?org/apache/spark/sql/streaming/DataStreamWriter.html > > and > > not the spark DataSource. To use the spark datasource, look at an example > > here https://hudi.apache.org/writing_data.html#datasource-writer. > > > > DataStreamWriters are a different set of API's which IIUC don't work > > interchangeably with DataSource. > > > > Thanks, > > Nishith > > > > On Mon, Oct 28, 2019 at 3:24 PM Qian Wang <qwang1...@gmail.com> wrote: > > > > > Hi All, > > > > > > I tried to use Datasource Writer to read streaming data from Kafka > topic > > > and write to Hudi dataset on HDFS. I used following codes: > > > > > > val output = data > > > .writeStream > > > .trigger(Trigger.ProcessingTime("300 seconds")) > > > .format("org.apache.hudi") > > > .option("hoodie.table.name", "hudi_ro_table") > > > .outputMode("append") > > > .option("path", fileLocation) > > > .option("checkpointLocation", s"${fileLocation}_chpk") > > > .start() > > > However, when I run this spark job it cannot write anything onto HDFS. > Can > > > anyone tell me how to do that? Thanks. > > > > > > Best, > > > Eric > > > >