[
https://issues.apache.org/jira/browse/HUDI-7711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17881530#comment-17881530
]
Ranga Reddy commented on HUDI-7711:
-----------------------------------
I encountered a similar exception while running a Spark application using
HoodieDeltaStreamer with Spark 3.5.2 and Hudi 0.15.
{code:java}
24/09/13 16:26:56 ERROR HoodieAsyncService: Service shutdown with error
java.util.concurrent.ExecutionException:
org.apache.hudi.exception.HoodieException: Cannot read properties from dfs from
file
file:/Users/test/spark_app/src/test/resources/streamer-config/dfs-source.properties
at
java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:396)
at
java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2073)
at
org.apache.hudi.async.HoodieAsyncService.waitForShutdown(HoodieAsyncService.java:103)
at
org.apache.hudi.utilities.ingestion.HoodieIngestionService.startIngestion(HoodieIngestionService.java:65)
at org.apache.hudi.common.util.Option.ifPresent(Option.java:101)
at
org.apache.hudi.utilities.streamer.HoodieStreamer.sync(HoodieStreamer.java:214)
at
org.apache.hudi.utilities.streamer.HoodieStreamer.main(HoodieStreamer.java:606)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native
Method)
at
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
at
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:569)
at
org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:1029)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:194)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:217)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:91)
at
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1120)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1129)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) {code}
As a temporary workaround, I created a dfs-source.properties file using the
following command:
{code:java}
#PWD path is /Users/test/spark_app
mkdir -p $PWD/src/test/resources/streamer-config
vi $PWD/src/test/resources/streamer-config/dfs-source.properties{code}
> Fix MultiTableStreamer can deal with path of properties file for each streamer
> ------------------------------------------------------------------------------
>
> Key: HUDI-7711
> URL: https://issues.apache.org/jira/browse/HUDI-7711
> Project: Apache Hudi
> Issue Type: Bug
> Components: hudi-utilities
> Environment: hudi0.14.1, Spark3.2
> Reporter: Jihwan Lee
> Assignee: Jihwan Lee
> Priority: Major
> Labels: pull-request-available
> Fix For: 1.1.0, 0.15.1
>
>
> HudiMultiTableStreamer initializes common configs, then deepcopy related
> fields into each streams.
> Because _propsFilePath_ on each streamer is not handled, they always retrieve
> path of test files as default value.
>
> Also, if runs MultiTableStreamer with {_}--hoodie-conf{_}, each streamer
> should be able to have these configs. (such like inheritance)
>
> MultiTable configs (kafka-source.properties):
>
> {code:java}
> ...
> hoodie.streamer.ingestion.tablesToBeIngested=db.tbl1,db.tb2
> hoodie.streamer.ingestion.db.tbl1.configFile=hdfs:///tmp/config_1.properties
> hoodie.streamer.ingestion.db.tbl2.configFile=hdfs:///tmp/config_2.properties
> ... {code}
>
>
> /tmp/config_1.properties:
>
> {code:java}
> ...
> hoodie.datasource.write.recordkey.field=id
> hoodie.streamer.source.kafka.topic=topic1
> ... {code}
>
>
> /tmp/config_2.properties:
> {code:java}
> ...
> hoodie.datasource.write.recordkey.field=id
> hoodie.streamer.source.kafka.topic=topic2
> ... {code}
>
> error log (workspace is replaced to \{RUNNING_PATH}) :
>
> {code:java}
> 24/05/04 21:41:01 ERROR config.DFSPropertiesConfiguration: Error reading in
> properties from dfs from file
> file:{RUNNING_PATH}/src/test/resources/streamer-config/dfs-source.properties
> 24/05/04 21:41:01 INFO streamer.StreamSync: Shutting down embedded timeline
> server
> 24/05/04 21:41:01 ERROR streamer.HoodieMultiTableStreamer: error while
> running MultiTableDeltaStreamer for table: {TABLE}
> org.apache.hudi.exception.HoodieIOException: Cannot read properties from dfs
> from file
> file:{RUNNING_PATH}/src/test/resources/streamer-config/dfs-source.properties
> at
> org.apache.hudi.common.config.DFSPropertiesConfiguration.addPropsFromFile(DFSPropertiesConfiguration.java:168)
> at
> org.apache.hudi.common.config.DFSPropertiesConfiguration.<init>(DFSPropertiesConfiguration.java:87)
> at
> org.apache.hudi.utilities.UtilHelpers.readConfig(UtilHelpers.java:258)
> at
> org.apache.hudi.utilities.streamer.HoodieStreamer$Config.getProps(HoodieStreamer.java:453)
> at
> org.apache.hudi.utilities.streamer.StreamSync.getDeducedSchemaProvider(StreamSync.java:714)
> at
> org.apache.hudi.utilities.streamer.StreamSync.fetchNextBatchFromSource(StreamSync.java:676)
> at
> org.apache.hudi.utilities.streamer.StreamSync.fetchFromSourceAndPrepareRecords(StreamSync.java:568)
> at
> org.apache.hudi.utilities.streamer.StreamSync.readFromSource(StreamSync.java:540)
> at
> org.apache.hudi.utilities.streamer.StreamSync.syncOnce(StreamSync.java:444)
> at
> org.apache.hudi.utilities.streamer.HoodieStreamer$StreamSyncService.ingestOnce(HoodieStreamer.java:874)
> at
> org.apache.hudi.utilities.ingestion.HoodieIngestionService.startIngestion(HoodieIngestionService.java:72)
> at org.apache.hudi.common.util.Option.ifPresent(Option.java:101)
> at
> org.apache.hudi.utilities.streamer.HoodieStreamer.sync(HoodieStreamer.java:216)
> at
> org.apache.hudi.utilities.streamer.HoodieMultiTableStreamer.sync(HoodieMultiTableStreamer.java:457)
> at
> org.apache.hudi.utilities.streamer.HoodieMultiTableStreamer.main(HoodieMultiTableStreamer.java:282)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at
> org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
> at
> org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:955)
> at
> org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
> at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
> at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
> at
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1043)
> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1052)
> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> Caused by: java.io.FileNotFoundException: File
> file:{RUNNING_PATH}/src/test/resources/streamer-config/dfs-source.properties
> does not exist
> at
> org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:641)
> at
> org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:930)
> at
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:631)
> at
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:454)
> at
> org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.<init>(ChecksumFileSystem.java:146)
> at
> org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:347)
> at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:899)
> at
> org.apache.hudi.storage.hadoop.HoodieHadoopStorage.open(HoodieHadoopStorage.java:97)
> at org.apache.hudi.common.config.DFSPro
> {code}
>
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)