suddendust opened a new issue, #10273:
URL: https://github.com/apache/pinot/issues/10273

   Certain segments in our servers can't come online due to this exception:
   
   ```
   Caused by: java.io.IOException: Failed to read response into file: 
/var/pinot/server/data/index/service_call_view_REALTIME/service_call_view__43__350__20230211T0514Z.tar.gz
   Caused by: java.nio.file.FileAlreadyExistsException: 
/var/pinot/server/data/index/service_call_view_REALTIME/service_call_view__43__350__20230211T0514Z.tar.gz
   2023/02/13 09:35:49.630 INFO [S3PinotFS] 
[HelixTaskExecutor-message_handle_thread_27] Copy 
s3://my-s3-bucket-pinot/controller/data/service_call_view/service_call_view__43__350__20230211T0514Z
 to local 
/var/pinot/server/data/index/service_call_view_REALTIME/service_call_view__43__350__20230211T0514Z.tar.gz
   2023/02/13 09:35:50.317 WARN [PinotFSSegmentFetcher] 
[HelixTaskExecutor-message_handle_thread_27] Caught exception while fetching 
segment from: 
s3://my-s3-bucket-pinot/controller/data/service_call_view/service_call_view__43__350__20230211T0514Z
 to: 
/var/pinot/server/data/index/service_call_view_REALTIME/service_call_view__43__350__20230211T0514Z.tar.gz
   software.amazon.awssdk.core.exception.SdkClientException: Unable to 
unmarshall response (Failed to read response into file: 
/var/pinot/server/data/index/service_call_view_REALTIME/service_call_view__43__350__20230211T0514Z.tar.gz).
 Response Code: 200, Response Text: OK
   Caused by: software.amazon.awssdk.core.exception.NonRetryableException: 
Failed to read response into file: 
/var/pinot/server/data/index/service_call_view_REALTIME/service_call_view__43__350__20230211T0514Z.tar.gz
   Caused by: java.io.IOException: Failed to read response into file: 
/var/pinot/server/data/index/service_call_view_REALTIME/service_call_view__43__350__20230211T0514Z.tar.gz
   Caused by: java.nio.file.FileAlreadyExistsException: 
/var/pinot/server/data/index/service_call_view_REALTIME/service_call_view__43__350__20230211T0514Z.tar.gz
   2023/02/13 09:35:51.300 INFO [S3PinotFS] 
[HelixTaskExecutor-message_handle_thread_27] Copy 
s3://my-s3-bucket-pinot/controller/data/service_call_view/service_call_view__43__350__20230211T0514Z
 to local 
/var/pinot/server/data/index/service_call_view_REALTIME/service_call_view__43__350__20230211T0514Z.tar.gz
   2023/02/13 09:35:52.058 WARN [PinotFSSegmentFetcher] 
[HelixTaskExecutor-message_handle_thread_27] Caught exception while fetching 
segment from: 
s3://my-s3-bucket-pinot/controller/data/service_call_view/service_call_view__43__350__20230211T0514Z
 to: 
/var/pinot/server/data/index/service_call_view_REALTIME/service_call_view__43__350__20230211T0514Z.tar.gz
   software.amazon.awssdk.core.exception.SdkClientException: Unable to 
unmarshall response (Failed to read response into file: 
/var/pinot/server/data/index/service_call_view_REALTIME/service_call_view__43__350__20230211T0514Z.tar.gz).
 Response Code: 200, Response Text: OK
   Caused by: software.amazon.awssdk.core.exception.NonRetryableException: 
Failed to read response into file: 
/var/pinot/server/data/index/service_call_view_REALTIME/service_call_view__43__350__20230211T0514Z.tar.gz
   Caused by: java.io.IOException: Failed to read response into file: 
/var/pinot/server/data/index/service_call_view_REALTIME/service_call_view__43__350__20230211T0514Z.tar.gz
   Caused by: java.nio.file.FileAlreadyExistsException: 
/var/pinot/server/data/index/service_call_view_REALTIME/service_call_view__43__350__20230211T0514Z.tar.gz
   2023/02/13 09:35:52.059 WARN 
[service_call_view_REALTIME-RealtimeTableDataManager] 
[HelixTaskExecutor-message_handle_thread_27] Failed to download segment 
service_call_view__43__350__20230211T0514Z from deep store:
   2023/02/13 09:35:52.060 WARN 
[service_call_view_REALTIME-RealtimeTableDataManager] 
[HelixTaskExecutor-message_handle_thread_27] Download segment 
service_call_view__43__350__20230211T0514Z from deepstore uri 
s3://my-s3-bucket-pinot/controller/data/service_call_view/service_call_view__43__350__20230211T0514Z
 failed.
   2023/02/13 09:35:52.060 ERROR 
[SegmentOnlineOfflineStateModelFactory$SegmentOnlineOfflineStateModel] 
[HelixTaskExecutor-message_handle_thread_27] Caught exception in state 
transition from OFFLINE -> ONLINE for resource: service_call_view_REALTIME, 
partition: service_call_view__43__350__20230211T0514Z
   2023/02/13 09:35:52.061 ERROR [HelixStateTransitionHandler] 
[HelixTaskExecutor-message_handle_thread_27] Exception while executing a state 
transition task service_call_view__43__350__20230211T0514Z
   2023/02/13 09:35:56.848 INFO [HelixTask] 
[HelixTaskExecutor-message_handle_thread_27] Message: 
18143b00-938e-472e-b81e-1469b00a5d72 (parent: null) handling task for 
service_call_view_REALTIME:service_call_view__43__350__20230211T0514Z completed 
at: 1676280956848, results: false. FrameworkTime: 15 ms; HandlerTime: 8675 ms.
   ```
   
   Due to this, the segment moves to `ERROR` state and rebalancing keeps 
failing. We can enable overwrites in this case?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to