gsudhanshu opened a new issue, #10503:
URL: https://github.com/apache/hudi/issues/10503
@ad1happy2go
I have setup hudi in cluster mode:
App server (driver)
DB server (master, worker, executors)
using local filesystem /var/<base_folder>
write works fine. read on that file is also ok for some time; but after some
time (12 hours or so) read fails with error: 'An error occurred while calling
o1748.load.\n: java.io.FileNotFoundException'
spark config:
```
spark = SparkSession.builder \
.appName("dataHudi") \
.master('spark://DBServer:7077') \
.config('spark.driver.bindAddress', '0.0.0.0') \
.config('spark.driver.host', 'App server') \
.config('spark.driver.port', '37077') \
.config('spark.driver.blockManager.port', '37078') \
.config('spark.executor.host', 'DBServer') \
.config("spark.executor.port", "37079") \
.config('spark.fileserver.host', 'DBServer') \
.config("spark.fileserver.port", "37080") \
.config('spark.replClassServer.host', 'DBServer') \
.config("spark.replClassServer.port", "37081") \
.config('spark.broadcast.host', 'DBServer') \
.config("spark.broadcast.port", "37082") \
.config('spark.driver.memory', '7g') \
.config('spark.executor.memory', '4g') \
.config('spark.jars.packages',
'org.apache.hudi:hudi-spark3.4-bundle_2.12:0.14.0') \
.config('spark.serializer',
'org.apache.spark.serializer.KryoSerializer') \
.config('spark.sql.catalog.spark_catalog',
'org.apache.spark.sql.hudi.catalog.HoodieCatalog') \
.config('spark.sql.extensions',
'org.apache.spark.sql.hudi.HoodieSparkSessionExtension') \
.getOrCreate()
```
read command (pyspark)
```
hudi_df_org = spark.read \
.format("hudi") \
.option("hoodie.datasource.read.table.name",
unique_filename) \
.option("hoodie.metadata.enable", "false" ) \
.load(basePath_for_visualiser_table)
```
write command (pyspark)
```
spark_df.write \
.format("org.apache.hudi") \
.options(**hudi_options) \
.mode("append") \
.save(basePath_ID +"/"+f"{unique_filename}")
```
not able to figure out why it works for some time and then stops working.
Kindly advise
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]