Limess opened a new issue, #9814: URL: https://github.com/apache/hudi/issues/9814
**Describe the problem you faced** After upgrading Hudi from 0.12.1 to 0.13.1 via an EMR upgrade I’m seeing a lot of these: ``` 23/09/25 16:51:57 INFO RemoteHoodieTableFileSystemView: Sending request : (http://ip-10-0-107-14.eu-west-1.compute.internal:38427/v1/hoodie/view/datafiles/beforeoron/latest/?partition=story_published_partition_date%3D2023-08-26&maxinstant=20230925101228159&basepath=s3%3A%2F%2Fprod-signal-articles-store%2Farticles_hudi_copy_on_write&lastinstantts=20230925142837150&timelinehash=839a7f3760bd309b411eecb46f32635c0eb8d06daac3fba349cb7713a6a698c7) 23/09/25 16:52:36 INFO RetryExec: I/O exception (org.apache.hudi.org.apache.http.NoHttpResponseException) caught when processing request to {}->http://ip-10-0-107-14.eu-west-1.compute.internal:38427/: The target server failed to respond 23/09/25 16:52:36 INFO RetryExec: Retrying request to {}->http://ip-10-0-107-14.eu-west-1.compute.internal:38427/ 23/09/25 16:53:06 INFO RetryExec: I/O exception (org.apache.hudi.org.apache.http.NoHttpResponseException) caught when processing request to {}->http://ip-10-0-107-14.eu-west-1.compute.internal:38427/: The target server failed to respond 23/09/25 16:53:06 INFO RetryExec: Retrying request to {}->http://ip-10-0-107-14.eu-west-1.compute.internal:38427/ 23/09/25 16:53:36 INFO RetryExec: I/O exception (org.apache.hudi.org.apache.http.NoHttpResponseException) caught when processing request to {}->http://ip-10-0-107-14.eu-west-1.compute.internal:38427/: The target server failed to respond 23/09/25 16:53:36 INFO RetryExec: Retrying request to {}->http://ip-10-0-107-14.eu-west-1.compute.internal:38427/ 23/09/25 16:54:07 WARN RetryHelper: Catch Exception for Sending request, will retry after 100 ms. org.apache.hudi.org.apache.http.NoHttpResponseException: ip-10-0-107-14.eu-west-1.compute.internal:38427 failed to respond ``` I’ve enabled retries, but it seems to be slowing down various write tasks a lot as they retry/fallover to secondary methods. Why would this be happening? Between these, and seemingly slower bloom filter lookups, jobs are taking 2x longer or more. I'm unsure if these correspond to these warnings on the driver logs: ``` WARN RequestHandler: Bad request response due to client view behind server view. Last known instant from client was 20230925142837150 but server has the following timeline [[20230405172930640__rollback__COMPLETED], [20230405220408317__rollback__COMPLETED], [20230405230726307__rollback__COMPLETED], [20230406004821619__rollback__COMPLETED], [20230406022626456__rollback__COMPLETED], [20230406040217179__rollback__COMPLETED], [20230406053604634__rollback__COMPLETED], [20230406071500195__rollback__COMPLETED], [20230406085932605__rollback__COMPLETED], [20230406091145473__rollback__COMPLETED], [20230904040946183__rollback__COMPLETED], [20230904200935082__rollback__COMPLETED], [20230905102904696__rollback__COMPLETED], [20230920120910043__commit__COMPLETED], [20230920161015352__commit__COMPLETED], [20230920200916636__commit__COMPLETED], [20230921000922099__commit__COMPLETED], [20230921040951133__commit__COMPLETED], [20230921081133533__commit__COMPLETED], [20230921081136531__clean__COMPLETED ], [20230921120938905__commit__COMPLETED], [20230921120941970__clean__COMPLETED], [20230921161019209__commit__COMPLETED], [20230921161022485__clean__COMPLETED], [20230921200920596__commit__COMPLETED], [20230921200923858__clean__COMPLETED], [20230922001011936__commit__COMPLETED], [20230922001014953__clean__COMPLETED], [20230922040943645__commit__COMPLETED], [20230922040946795__clean__COMPLETED], [20230922080911829__commit__COMPLETED], [20230922080915209__clean__COMPLETED], [20230922120928185__commit__COMPLETED], [20230922120931568__clean__COMPLETED], [20230922161014635__commit__COMPLETED], [20230922161017634__clean__COMPLETED], [20230922200911764__commit__COMPLETED], [20230922200914501__clean__COMPLETED], [20230923000928118__commit__COMPLETED], [20230923000931194__clean__COMPLETED], [20230923040937860__commit__COMPLETED], [20230923040940748__clean__COMPLETED], [20230923080919659__commit__COMPLETED], [20230923080922740__clean__COMPLETED], [20230923120913393__commit__COMPLETED], [20230 923120916656__clean__COMPLETED], [20230923160937358__commit__COMPLETED], [20230923160940858__clean__COMPLETED], [20230923200914761__commit__COMPLETED], [20230923200917719__clean__COMPLETED], [20230924000958223__commit__COMPLETED], [20230924001001271__clean__COMPLETED], [20230924040915658__commit__COMPLETED], [20230924040918676__clean__COMPLETED], [20230924080919687__commit__COMPLETED], [20230924080922913__clean__COMPLETED], [20230924120907571__commit__COMPLETED], [20230924120910946__clean__COMPLETED], [20230924160910339__commit__COMPLETED], [20230924160913410__clean__COMPLETED], [20230924200912759__commit__COMPLETED], [20230924200915964__clean__COMPLETED], [20230925000926377__commit__COMPLETED], [20230925000931547__clean__COMPLETED], [20230925041024449__commit__COMPLETED], [20230925041027798__clean__COMPLETED], [20230925080953746__commit__COMPLETED], [20230925080957003__clean__COMPLETED], [20230925101228159__commit__COMPLETED], [20230925101231993__clean__COMPLETED], [202309251146078 21__clean__COMPLETED], [20230925142837150__rollback__COMPLETED], [20230925161210335__rollback__COMPLETED]] 23/09/25 17:12:41 INFO HoodieActiveTimeline: Loaded instants upto : Option{val=[20230925161210335__rollback__COMPLETED]} I’m also seeing similar errors on writes: ``` Caused by: org.apache.hudi.exception.HoodieRemoteException: Failed to create marker file story_published_partition_date=2023-01-06/47d20ede-bbbe-4cd9-91d1-41993c76752a-0_668-25-96261_20230925161205373.parquet.marker.MERGE ip-10-0-107-14.eu-west-1.compute.internal:38427 failed to respond ``` **To Reproduce** Steps to reproduce the behavior: 1. 2. 3. 4. **Expected behavior** A clear and concise description of what you expected to happen. **Environment Description** * Hudi version : * Spark version : * Hive version : * Hadoop version : * Storage (HDFS/S3/GCS..) : * Running on Docker? (yes/no) : **Additional context** Add any other context about the problem here. **Stacktrace** ```Add the stacktrace of the error.``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
