dang-stripe commented on issue #8948:
URL: https://github.com/apache/pinot/issues/8948#issuecomment-1272044875

   This issue happened again, but we were able to get more logs this time. It 
seems like:
   
   1. Job sends segment upload request controller
   2. Controller processes upload, adds segment to property store
   3. Controller starts shutting down including ZK conn
   4. Controller fails to update ideal state due to ZK conn closed
   5. Upload request fails
   
   The batch job that performed the push seemed to succeed surprisingly. When 
we try to perform the upload again by re-running the push job, it does a no-op 
because it finds the segment already, but nothing updates the ideal state.
   
   ```
   [2022-10-07 01:49:04.335934] INFO 
[PinotSegmentUploadDownloadRestletResource:324] 
[jersey-server-managed-async-executor-6:224] HTTP Header: DOWNLOAD_URI is: 
s3://...
   [2022-10-07 01:49:04.338695] INFO 
[PinotSegmentUploadDownloadRestletResource:246] 
[jersey-server-managed-async-executor-6:224] Uploading a segment 
table1_1_2022-09-24-21_2022-10-06-23_0 to table: table1, push type METADATA, 
(Derived from API parameter)
   [2022-10-07 01:49:04.338807] INFO 
[PinotSegmentUploadDownloadRestletResource:267] 
[jersey-server-managed-async-executor-6:224] Processing upload request for 
segment: table1_1_2022-09-24-21_2022-10-06-23_0 of table: table1_OFFLINE from 
client: localhost, ingestion descriptor: null
   [2022-10-07 01:49:04.338867] INFO 
[PinotSegmentUploadDownloadRestletResource:295] 
[jersey-server-managed-async-executor-6:224] Setting zkDownloadUri: to s3://... 
for segment: table1_1_2022-09-24-21_2022-10-06-23_0 of table: table1_OFFLINE, 
skipping move
   [2022-10-07 01:49:04.340471] INFO [ZKOperator:76] 
[jersey-server-managed-async-executor-6:224] Adding new segment 
table1_1_2022-09-24-21_2022-10-06-23_0 from table table1_OFFLINE
   [2022-10-07 01:49:04.340565] INFO [ZKOperator:240] 
[jersey-server-managed-async-executor-6:224] Skipping segment move, keeping 
segment table1_1_2022-09-24-21_2022-10-06-23_0 from table table1_OFFLINE at 
s3://...
   [2022-10-07 01:49:04.646401] INFO [PinotHelixResourceManager:1799] 
[jersey-server-managed-async-executor-6:224] Added segment: 
table1_1_2022-09-24-21_2022-10-06-23_0 of table: table1_OFFLINE to property 
store
   [2022-10-07 01:51:52.018327] ERROR [PinotHelixResourceManager:1834] 
[jersey-server-managed-async-executor-6:224] Caught exception while adding 
segment: table1_1_2022-09-24-21_2022-10-06-23_0 to IdealState for table: 
table1_OFFLINE, deleting segment ZK metadata
   [2022-10-07 01:51:52.018343] java.lang.RuntimeException: Caught exception 
while updating ideal state for resource: table1_OFFLINE
   [2022-10-07 01:51:52.018368]         at 
org.apache.pinot.common.utils.helix.HelixHelper.updateIdealState(HelixHelper.java:160)
 
~[pinot-all-0.9.0-2022-09-15-148e6c38e-SNAPSHOT-jar-with-dependencies.jar:0.9.0-2022-09-15-148e6c38e-SNAPSHOT-148e6c38ec4e797319fa1b66f0f212a4c2b2f171]
   [2022-10-07 01:51:52.018392]         at 
org.apache.pinot.common.utils.helix.HelixHelper.updateIdealState(HelixHelper.java:180)
 
~[pinot-all-0.9.0-2022-09-15-148e6c38e-SNAPSHOT-jar-with-dependencies.jar:0.9.0-2022-09-15-148e6c38e-SNAPSHOT-148e6c38ec4e797319fa1b66f0f212a4c2b2f171]
   [2022-10-07 01:51:52.018418]         at 
org.apache.pinot.controller.helix.core.PinotHelixResourceManager.assignTableSegment(PinotHelixResourceManager.java:1814)
 
~[pinot-all-0.9.0-2022-09-15-148e6c38e-SNAPSHOT-jar-with-dependencies.jar:0.9.0-2022-09-15-148e6c38e-SNAPSHOT-148e6c38ec4e797319fa1b66f0f212a4c2b2f171]
   Show all 46 lines
   [2022-10-07 01:51:52.025769] WARN [ZkClient:1273] 
[jersey-server-managed-async-executor-6:224] Failed to delete path 
/pinot_cluster/PROPERTYSTORE/SEGMENTS/table1_OFFLINE/table1_1_2022-09-24-21_2022-10-06-23_0!
 java.lang.IllegalStateException: ZkClient already closed!
   [2022-10-07 01:51:52.028874] ERROR 
[PinotSegmentUploadDownloadRestletResource:53] 
[jersey-server-managed-async-executor-6:224] Caught internal server exception 
while uploading segment [2022-10-07 01:51:52.028892] 
java.lang.IllegalStateException: ZkClient already closed!
   ```
   
   Shutdown logs on the same host:
   ```
   [2022-10-07 01:51:46.631185] INFO [PinotServiceManager:233] [Thread-1:30] 
Shutting down Pinot Service Manager with all running Pinot instances...
   --
     | [2022-10-07 01:51:51.679919] INFO [HelixTaskExecutor:1148] [Thread-1:30] 
Shutting down HelixTaskExecutor
     | [2022-10-07 01:51:51.680457] INFO [HelixTaskExecutor:535] [Thread-1:30] 
Shutting down pool: java.util.concurrent.ThreadPoolExecutor@6aed4a23[Running, 
pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 0]
     | [2022-10-07 01:51:51.680633] INFO [HelixTaskExecutor:535] [Thread-1:30] 
Shutting down pool: java.util.concurrent.ThreadPoolExecutor@404566e7[Running, 
pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 0]
     | [2022-10-07 01:51:51.680777] INFO [HelixTaskExecutor:535] [Thread-1:30] 
Shutting down pool: java.util.concurrent.ThreadPoolExecutor@2559f65c[Running, 
pool size = 40, active threads = 0, queued tasks = 0, completed tasks = 56]
     | [2022-10-07 01:51:51.692292] INFO [HelixTaskExecutor:535] [Thread-1:30] 
Shutting down pool: java.util.concurrent.ThreadPoolExecutor@7d5cbcc9[Running, 
pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 0]
     | [2022-10-07 01:51:51.821168] INFO [BaseControllerStarter:718] 
[Thread-1:30] Shutting down executor service
     | [2022-10-07 01:51:51.821300] INFO [HelixTaskExecutor:1148] [Thread-1:30] 
Shutting down HelixTaskExecutor
     | [2022-10-07 01:51:51.821982] INFO [HelixTaskExecutor:535] [Thread-1:30] 
Shutting down pool: java.util.concurrent.ThreadPoolExecutor@4f482051[Running, 
pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 0]
     | [2022-10-07 01:51:51.822040] INFO [HelixTaskExecutor:535] [Thread-1:30] 
Shutting down pool: java.util.concurrent.ThreadPoolExecutor@294804bd[Running, 
pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 0]
     | [2022-10-07 01:51:51.822095] INFO [HelixTaskExecutor:535] [Thread-1:30] 
Shutting down pool: java.util.concurrent.ThreadPoolExecutor@51573738[Running, 
pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 0]
     | [2022-10-07 01:51:51.822145] INFO [HelixTaskExecutor:535] [Thread-1:30] 
Shutting down pool: java.util.concurrent.ThreadPoolExecutor@2474df51[Running, 
pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 0]
     | [2022-10-07 01:52:04.331357] INFO [GenericHelixController:1082] 
[Thread-1:30] Shutting down DEFAULT pipeline
     | [2022-10-07 01:52:05.013192] INFO [GenericHelixController:1085] 
[Thread-1:30] Shutting down TASK pipeline
     | [2022-10-07 01:52:05.127604] INFO [BaseControllerStarter:679] 
[Thread-1:30] Finish shutting down Pinot controller for 
Controller_pinot-controller-fw-3.service.consul_9000
     | [2022-10-07 01:52:05.128153] INFO [PinotServiceManager:224] 
[Thread-1:30] Shutting down Pinot Service Manager admin application...
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to