matthiasdg opened a new issue, #5242:
URL: https://github.com/apache/hudi/issues/5242
**Describe the problem you faced**
With hudi 0.9, if I load a number of dataframes and then loop over them and
write them using the hudi's Spark datasource writer, I can see the embedded
timeline server being started and used every iteration (off-topic: the
`hoodie.embed.timeline.server.reuse.enabled` does not seem to have any effect).
If I do the same using hudi 0.10 or 0.10.1, the first write is successful,
but after that I get (pasted log from after second iteration/timeline server
start):
```
__ __ _
/ /____ _ _ __ ____ _ / /(_)____
__ / // __ `/| | / // __ `// // // __ \
/ /_/ // /_/ / | |/ // /_/ // // // / / /
\____/ \__,_/ |___/ \__,_//_//_//_/ /_/
https://javalin.io/documentation
[INFO] [18:56:25.608]
[pool-1-thread-1-ScalaTest-running-TmlDataQualityDeequSpec] 139 Starting
Javalin ...
[INFO] [18:56:25.613]
[pool-1-thread-1-ScalaTest-running-TmlDataQualityDeequSpec] 113 Listening on
http://localhost:27055/
[INFO] [18:56:25.613]
[pool-1-thread-1-ScalaTest-running-TmlDataQualityDeequSpec] 149 Javalin started
in 6ms \o/
[info] - should partitioned into smaller dataframes - have 2 values per 120
minutes for each deviceId *** FAILED ***
[info] org.apache.hudi.exception.HoodieRemoteException:
192.168.0.215:27055 failed to respond
[info] at
org.apache.hudi.common.table.view.RemoteHoodieTableFileSystemView.refresh(RemoteHoodieTableFileSystemView.java:418)
[info] at
org.apache.hudi.common.table.view.RemoteHoodieTableFileSystemView.reset(RemoteHoodieTableFileSystemView.java:453)
[info] at
org.apache.hudi.common.table.view.PriorityBasedFileSystemView.sync(PriorityBasedFileSystemView.java:256)
[info] at
org.apache.hudi.client.SparkRDDWriteClient.getTableAndInitCtx(SparkRDDWriteClient.java:492)
[info] at
org.apache.hudi.client.SparkRDDWriteClient.getTableAndInitCtx(SparkRDDWriteClient.java:447)
[info] at
org.apache.hudi.client.SparkRDDWriteClient.insert(SparkRDDWriteClient.java:179)
[info] at
org.apache.hudi.DataSourceUtils.doWriteOperation(DataSourceUtils.java:212)
[info] at
org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:275)
[info] at
org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:164)
[info] at
org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:46)
[info] ...
[info] Cause: org.apache.http.NoHttpResponseException: 192.168.0.215:27055
failed to respond
[info] at
org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:143)
[info] at
org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:57)
[info] at
org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:261)
[info] at
org.apache.http.impl.DefaultBHttpClientConnection.receiveResponseHeader(DefaultBHttpClientConnection.java:165)
[info] at
org.apache.http.impl.conn.CPoolProxy.receiveResponseHeader(CPoolProxy.java:167)
[info] at
org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:272)
[info] at
org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:124)
[info] at
org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:271)
[info] at
org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:184)
[info] at
org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:88)
[info] ...
```
We defined the port used for the timeline service ourselves through config
(since we also work with remote drivers were some ports are blocked etc) with
`hoodie.embed.timeline.server.port`. This works fine in 0.9, but starting from
0.10 we get the above error. If we don't specify this port, it still works ok.
**To Reproduce**
Steps to reproduce the behavior:
1. load a number of dataframes
2. write them successively with the embedded timeline server with a fixed
`hoodie.embed.timeline.server.port` defined in the write config
3. The first write succeeds, the next ones fail
**Expected behavior**
Think this should still work (I can turn off embedded timeline or not
specify this port for now, or use a remote timeline service as workarounds, I
guess)
**Environment Description**
* Hudi version : 0.10/0.10.1 fails, 0.9 worked
* Spark version : 3.1.2
* Hive version :
* Hadoop version : 3.2.0
* Storage (HDFS/S3/GCS..) : Azure Data Lake Gen 2
* Running on Docker? (yes/no) : happens with the driver + executors in local
mode but also on k8s.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]