Ma77Ball opened a new issue, #5646:
URL: https://github.com/apache/texera/issues/5646
### What happened?
The file-service exits during startup with a fatal error when its LakeFS
health check runs before LakeFS has finished bringing up its HTTP listener.
Because services boot concurrently, the file-service can reach
`LakeFSStorageClient.healthCheck()` a fraction of a second before LakeFS is
ready, receive a transient `java.net.SocketException: Connection reset`, and
crash. With no file-service running, all dataset operations (including dataset
upload) fail. Observed log:
```
java.lang.RuntimeException: Failed to connect to lake fs server:
java.net.SocketException: Connection reset
at LakeFSStorageClient$.healthCheck(LakeFSStorageClient.scala:76)
at FileService.run(FileService.scala:83)
```
### How to reproduce?
1. Start the full stack so file-service and LakeFS boot at the same time
(e.g. local dev compose, or any deployment where ordering is not enforced).
2. If file-service reaches its startup health check before LakeFS's HTTP
server is accepting requests, file-service exits with the error above.
3. Attempt to upload a dataset in the UI; the request fails because the
file-service is not running.
### Version/Branch
main
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]