[
https://issues.apache.org/jira/browse/HUDI-4374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated HUDI-4374:
---------------------------------
Labels: pull-request-available streaming (was: streaming)
> Support BULK_INSERT row-writing on streaming Dataset/DataFrame
> ---------------------------------------------------------------
>
> Key: HUDI-4374
> URL: https://issues.apache.org/jira/browse/HUDI-4374
> Project: Apache Hudi
> Issue Type: Task
> Components: spark, writer-core
> Reporter: Sagar Sumit
> Assignee: Sagar Sumit
> Priority: Blocker
> Labels: pull-request-available, streaming
> Fix For: 0.12.0
>
>
> With structured streaming setup, when Hudi table is written from a streaming
> source, then HoodieStreamingSink calls HoodieSparkSqlWriter.write(). If
> BULK_INSERT operation type is set, then HoodieSparkSqlWriter.write()
> internally calls HoodieSparkSqlWriter.bulkInsertAsRow() which does a simple
> df.write.format("hudi").options(...).save(). The 'write' call does not work
> on streaming Dataset/DataFrame.
> {code:java}
> org.apache.spark.sql.AnalysisException: 'write' can not be called on
> streaming Dataset/DataFrame
> at
> org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
> at org.apache.spark.sql.Dataset.write(Dataset.scala:3377)
> at
> org.apache.hudi.HoodieSparkSqlWriter$.bulkInsertAsRow(HoodieSparkSqlWriter.scala:557)
> at
> org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:178)
> at
> org.apache.hudi.HoodieStreamingSink.$anonfun$addBatch$2(HoodieStreamingSink.scala:91)
> at scala.util.Try$.apply(Try.scala:213)
> at
> org.apache.hudi.HoodieStreamingSink.$anonfun$addBatch$1(HoodieStreamingSink.scala:90)
> at
> org.apache.hudi.HoodieStreamingSink.retry(HoodieStreamingSink.scala:166)
> at
> org.apache.hudi.HoodieStreamingSink.addBatch(HoodieStreamingSink.scala:89)
> {code}
> Bulk insert can still be done by not going via the row-writing path. But, we
> need to fix the HoodieStreamingSink to support bulk insert via row-writing.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)