suddendust opened a new issue, #10273: URL: https://github.com/apache/pinot/issues/10273
Certain segments in our servers can't come online due to this exception: ``` Caused by: java.io.IOException: Failed to read response into file: /var/pinot/server/data/index/service_call_view_REALTIME/service_call_view__43__350__20230211T0514Z.tar.gz Caused by: java.nio.file.FileAlreadyExistsException: /var/pinot/server/data/index/service_call_view_REALTIME/service_call_view__43__350__20230211T0514Z.tar.gz 2023/02/13 09:35:49.630 INFO [S3PinotFS] [HelixTaskExecutor-message_handle_thread_27] Copy s3://my-s3-bucket-pinot/controller/data/service_call_view/service_call_view__43__350__20230211T0514Z to local /var/pinot/server/data/index/service_call_view_REALTIME/service_call_view__43__350__20230211T0514Z.tar.gz 2023/02/13 09:35:50.317 WARN [PinotFSSegmentFetcher] [HelixTaskExecutor-message_handle_thread_27] Caught exception while fetching segment from: s3://my-s3-bucket-pinot/controller/data/service_call_view/service_call_view__43__350__20230211T0514Z to: /var/pinot/server/data/index/service_call_view_REALTIME/service_call_view__43__350__20230211T0514Z.tar.gz software.amazon.awssdk.core.exception.SdkClientException: Unable to unmarshall response (Failed to read response into file: /var/pinot/server/data/index/service_call_view_REALTIME/service_call_view__43__350__20230211T0514Z.tar.gz). Response Code: 200, Response Text: OK Caused by: software.amazon.awssdk.core.exception.NonRetryableException: Failed to read response into file: /var/pinot/server/data/index/service_call_view_REALTIME/service_call_view__43__350__20230211T0514Z.tar.gz Caused by: java.io.IOException: Failed to read response into file: /var/pinot/server/data/index/service_call_view_REALTIME/service_call_view__43__350__20230211T0514Z.tar.gz Caused by: java.nio.file.FileAlreadyExistsException: /var/pinot/server/data/index/service_call_view_REALTIME/service_call_view__43__350__20230211T0514Z.tar.gz 2023/02/13 09:35:51.300 INFO [S3PinotFS] [HelixTaskExecutor-message_handle_thread_27] Copy s3://my-s3-bucket-pinot/controller/data/service_call_view/service_call_view__43__350__20230211T0514Z to local /var/pinot/server/data/index/service_call_view_REALTIME/service_call_view__43__350__20230211T0514Z.tar.gz 2023/02/13 09:35:52.058 WARN [PinotFSSegmentFetcher] [HelixTaskExecutor-message_handle_thread_27] Caught exception while fetching segment from: s3://my-s3-bucket-pinot/controller/data/service_call_view/service_call_view__43__350__20230211T0514Z to: /var/pinot/server/data/index/service_call_view_REALTIME/service_call_view__43__350__20230211T0514Z.tar.gz software.amazon.awssdk.core.exception.SdkClientException: Unable to unmarshall response (Failed to read response into file: /var/pinot/server/data/index/service_call_view_REALTIME/service_call_view__43__350__20230211T0514Z.tar.gz). Response Code: 200, Response Text: OK Caused by: software.amazon.awssdk.core.exception.NonRetryableException: Failed to read response into file: /var/pinot/server/data/index/service_call_view_REALTIME/service_call_view__43__350__20230211T0514Z.tar.gz Caused by: java.io.IOException: Failed to read response into file: /var/pinot/server/data/index/service_call_view_REALTIME/service_call_view__43__350__20230211T0514Z.tar.gz Caused by: java.nio.file.FileAlreadyExistsException: /var/pinot/server/data/index/service_call_view_REALTIME/service_call_view__43__350__20230211T0514Z.tar.gz 2023/02/13 09:35:52.059 WARN [service_call_view_REALTIME-RealtimeTableDataManager] [HelixTaskExecutor-message_handle_thread_27] Failed to download segment service_call_view__43__350__20230211T0514Z from deep store: 2023/02/13 09:35:52.060 WARN [service_call_view_REALTIME-RealtimeTableDataManager] [HelixTaskExecutor-message_handle_thread_27] Download segment service_call_view__43__350__20230211T0514Z from deepstore uri s3://my-s3-bucket-pinot/controller/data/service_call_view/service_call_view__43__350__20230211T0514Z failed. 2023/02/13 09:35:52.060 ERROR [SegmentOnlineOfflineStateModelFactory$SegmentOnlineOfflineStateModel] [HelixTaskExecutor-message_handle_thread_27] Caught exception in state transition from OFFLINE -> ONLINE for resource: service_call_view_REALTIME, partition: service_call_view__43__350__20230211T0514Z 2023/02/13 09:35:52.061 ERROR [HelixStateTransitionHandler] [HelixTaskExecutor-message_handle_thread_27] Exception while executing a state transition task service_call_view__43__350__20230211T0514Z 2023/02/13 09:35:56.848 INFO [HelixTask] [HelixTaskExecutor-message_handle_thread_27] Message: 18143b00-938e-472e-b81e-1469b00a5d72 (parent: null) handling task for service_call_view_REALTIME:service_call_view__43__350__20230211T0514Z completed at: 1676280956848, results: false. FrameworkTime: 15 ms; HandlerTime: 8675 ms. ``` Due to this, the segment moves to `ERROR` state and rebalancing keeps failing. We can enable overwrites in this case? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
