vpeack opened a new issue #11478:
URL: https://github.com/apache/druid/issues/11478


   Hi everyone,
   
   Following a [post on ASF 
slack](https://the-asf.slack.com/archives/CJ8D1JTB8/p1626879127422700?thread_ts=1626868492.418800&cid=CJ8D1JTB8),
 I open up a new issue here on the advice of someone from Imply.
   We are running compaction tasks through indexers that randomly fail on phase 
3 (partial_index_generic_merge) with the following error message (more details 
below) : "error in opening zip file"
   
   The reply we had on slack : 
   As to the specific error, I'm not sure if it's exactly the same as what's 
going on in https://github.com/apache/druid/issues/9993, but that issue does 
point out an important thing, which is that if the shuffle server returns an 
error, the shuffle client will not actually log out that error, but it will 
just log this sort of obtuse zip decompression error. (Because it's trying to 
unzip the error message.) This isn't good error behavior, so we should adjust 
that to log the actual server error instead of trying to unzip the error 
message. Which is silly!
   This seems an indexer bug .Could you please create a BUG request in druid 
github project with all the details.
   
   ### Affected Version
   
   0.21.0
   
   ### Description
   - Cluster size
    1 master (coordinator/overlord)
    2 routers/brokers
    ~10 historicals
    ~20 indexers (dedicated to these tasks) + ~5 indexers for realtime 
ingestion (kafka)
    ~30TB data
   
   - Configurations in use
   Spec object we are using : 
   `{
   "type": "index_parallel",
     "spec": {
       "ioConfig": {
         "type": "index_parallel",
         "inputSource": {
           "type": "druid",
           "dataSource": "events",
           "interval": "2021-07-13T00:00:00/2021-07-14T00:00:00"
         }
       },
       "tuningConfig": {
         "type": "index_parallel",
         "partitionsSpec": {
           "type": "hashed",
           "maxRowsPerSegment": 800000
         },
         "forceGuaranteedRollup": true,
         "maxNumConcurrentSubTasks": 40,
         "totalNumMergeTasks": 20,
         "maxRetry": 10,
         "maxPendingPersists": 1,
         "maxRowsPerSegment": 800000
       },
       "dataSchema": {
         "dataSource": "events",
         "granularitySpec": {
           "type": "uniform",
           "queryGranularity": "HOUR",
           "segmentGranularity": "HOUR",
           "rollup": true
         },
         "timestampSpec": {
           "column": "__time",
           "format": "iso"
         },
         "dimensionsSpec": {         
         },
         "metricsSpec": [
         ]
       }
     }
   }`
   - Steps to reproduce the problem
   Happens randomly 
   - The error message or stack traces encountered. Providing more context, 
such as nearby log messages or even entire logs, can be helpful.
   `{"severity": "INFO", "message": 
"[[partial_index_generic_merge_events_gpceoeme_2021-07-21T11:15:41.883Z]-threading-task-runner-executor-0]
 org.apache.druid.utils.CompressionUtils - Unzipping 
file[/opt/druid-data/task/partial_index_generic_merge_events_gpceoeme_2021-07-21T11:15:41.883Z/work/indexing-tmp/2021-07-20T08:00:00.000Z/2021-07-20T09:00:00.000Z/10/temp_partial_index_generate_events_ooikmkan_2021-07-21T11:00:25.016Z]
 to 
[/opt/druid-data/task/partial_index_generic_merge_events_gpceoeme_2021-07-21T11:15:41.883Z/work/indexing-tmp/2021-07-20T08:00:00.000Z/2021-07-20T09:00:00.000Z/10/unzipped_partial_index_generate_events_ooikmkan_2021-07-21T11:00:25.016Z]"}
   {"severity": "ERROR", "message": 
"[[partial_index_generic_merge_events_gpceoeme_2021-07-21T11:15:41.883Z]-threading-task-runner-executor-0]
 org.apache.druid.indexing.overlord.ThreadingTaskRunner - Exception caught 
while running the task."}
   java.util.zip.ZipException: error in opening zip file
           at java.util.zip.ZipFile.open(Native Method) ~[?:1.8.0_292]
           at java.util.zip.ZipFile.<init>(ZipFile.java:225) ~[?:1.8.0_292]
           at java.util.zip.ZipFile.<init>(ZipFile.java:155) ~[?:1.8.0_292]
           at java.util.zip.ZipFile.<init>(ZipFile.java:169) ~[?:1.8.0_292]
           at 
org.apache.druid.utils.CompressionUtils.unzip(CompressionUtils.java:235) 
~[druid-core-0.21.0.jar:0.21.0]
           at 
org.apache.druid.indexing.common.task.batch.parallel.PartialSegmentMergeTask.fetchSegmentFiles(PartialSegmentMergeTask.java:224)
 ~[druid-indexing-service-0.21.0.jar:0.21.0]
           at 
org.apache.druid.indexing.common.task.batch.parallel.PartialSegmentMergeTask.runTask(PartialSegmentMergeTask.java:162)
 ~[druid-indexing-service-0.21.0.jar:0.21.0]
           at 
org.apache.druid.indexing.common.task.batch.parallel.PartialGenericSegmentMergeTask.runTask(PartialGenericSegmentMergeTask.java:41)
 ~[druid-indexing-service-0.21.0.jar:0.21.0]
           at 
org.apache.druid.indexing.common.task.AbstractBatchIndexTask.run(AbstractBatchIndexTask.java:152)
 ~[druid-indexing-service-0.21.0.jar:0.21.0]
           at 
org.apache.druid.indexing.overlord.ThreadingTaskRunner$1.call(ThreadingTaskRunner.java:211)
 [druid-indexing-service-0.21.0.jar:0.21.0]
           at 
org.apache.druid.indexing.overlord.ThreadingTaskRunner$1.call(ThreadingTaskRunner.java:151)
 [druid-indexing-service-0.21.0.jar:0.21.0]
           at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
[?:1.8.0_292]
           at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
[?:1.8.0_292]
           at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
[?:1.8.0_292]
           at java.lang.Thread.run(Thread.java:748) [?:1.8.0_292]`
   - Any debugging that you have already done
   N/A
   
   Any ideas on how to we can resolve this ? 
   Feel free to ask if you need anything else.
   
   Thanks a lot
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to