BartolomeD opened a new issue #10206:
URL: https://github.com/apache/druid/issues/10206


   ### Affected Version
   0.19.0-rc1
   
   ### Description
   Immediately after successfully completing both the 
`partial_dimension_distribution` and `partial_range_index_generate` phases of 
`index_parallel` ingestion all `partial_index_generic_merge` tasks fail without 
producing logs. Below is the log for the main `index_parallel` task. As you can 
see, the inputSplit is _massive_ and has almost 3 million characters. Could 
this be a reason those tasks fail? In this log we see the tasks fail without a 
reason being provided: 
   ```
   2020-07-21T14:23:54,024 WARN [task-monitor-0] 
org.apache.druid.indexing.common.task.batch.parallel.TaskMonitor - 
task[partial_index_generic_merge_listing-stats_hjlgleak_2020-07-21T14:23:51.446Z]
 failed!
   ```
   
   
`druid/indexing-logs/index_parallel_listing-stats_ompohlkj_2020-07-19T00:47:06.225Z/log`:
   
   ```
   2020-07-21T14:23:52,753 INFO [task-runner-0-priority-0] 
org.apache.druid.indexing.common.task.batch.parallel.ParallelIndexPhaseRunner - 
Submit a new task for 
spec[index_parallel_listing-stats_ompohlkj_2020-07-19T00:47:06.225Z_5] and 
inputSplit[InputSplit{split=[PartitionLocation{host='ip-10-0-2-94.ec2.internal',
 .......
   
   < THIS LINE HAS 2920968 characters, I HAVE TRUNCATED THIS >
   
   ......, secondaryPartition=BuildingSingleDimensionShardSpec{bucketId=15, 
dimension='H3_HEX_RES_5', start='851f40c3fffffff', end='851f8d47fffffff', 
partitionNum=54}}]}]
   2020-07-21T14:23:52,915 INFO [task-monitor-0] 
org.apache.druid.indexing.common.task.batch.parallel.TaskMonitor - Submitted a 
new 
task[partial_index_generic_merge_listing-stats_bpjmkhkj_2020-07-21T14:23:52.698Z]
 for retrying 
spec[index_parallel_listing-stats_ompohlkj_2020-07-19T00:47:06.225Z_2]
   2020-07-21T14:23:53,320 WARN [task-monitor-0] 
org.apache.druid.indexing.common.task.batch.parallel.TaskMonitor - 
task[partial_index_generic_merge_listing-stats_gkbimbgc_2020-07-21T14:23:51.045Z]
 failed!
   2020-07-21T14:23:53,320 INFO [task-monitor-0] 
org.apache.druid.indexing.common.task.batch.parallel.TaskMonitor - We still 
have more chances[2/3] to process the 
spec[index_parallel_listing-stats_ompohlkj_2020-07-19T00:47:06.225Z_0].
   2020-07-21T14:23:53,334 INFO [task-runner-0-priority-0] 
org.apache.druid.indexing.common.task.batch.parallel.TaskMonitor - Submitted a 
new 
task[partial_index_generic_merge_listing-stats_lcohkdeb_2020-07-21T14:23:53.087Z]
 for spec[index_parallel_listing-stats_ompohlkj_2020-07-19T00:47:06.225Z_5]
   2020-07-21T14:23:53,511 INFO [task-runner-0-priority-0] 
org.apache.druid.indexing.common.task.batch.parallel.ParallelIndexPhaseRunner - 
Waiting for subTasks to be completed
   2020-07-21T14:23:53,687 INFO [task-monitor-0] 
org.apache.druid.indexing.common.task.batch.parallel.TaskMonitor - Submitted a 
new 
task[partial_index_generic_merge_listing-stats_hgedboii_2020-07-21T14:23:53.511Z]
 for retrying 
spec[index_parallel_listing-stats_ompohlkj_2020-07-19T00:47:06.225Z_0]
   2020-07-21T14:23:54,024 WARN [task-monitor-0] 
org.apache.druid.indexing.common.task.batch.parallel.TaskMonitor - 
task[partial_index_generic_merge_listing-stats_hjlgleak_2020-07-21T14:23:51.446Z]
 failed!
   2020-07-21T14:23:54,024 INFO [task-monitor-0] 
org.apache.druid.indexing.common.task.batch.parallel.TaskMonitor - We still 
have more chances[1/3] to process the 
spec[index_parallel_listing-stats_ompohlkj_2020-07-19T00:47:06.225Z_3].
   2020-07-21T14:23:54,195 INFO [task-monitor-0] 
org.apache.druid.indexing.common.task.batch.parallel.TaskMonitor - Submitted a 
new 
task[partial_index_generic_merge_listing-stats_ibkdpjng_2020-07-21T14:23:54.024Z]
 for retrying 
spec[index_parallel_listing-stats_ompohlkj_2020-07-19T00:47:06.225Z_3]
   2020-07-21T14:23:54,532 WARN [task-monitor-0] 
org.apache.druid.indexing.common.task.batch.parallel.TaskMonitor - 
task[partial_index_generic_merge_listing-stats_kjlmjglj_2020-07-21T14:23:52.261Z]
 failed!
   2020-07-21T14:23:54,532 INFO [task-monitor-0] 
org.apache.druid.indexing.common.task.batch.parallel.TaskMonitor - We still 
have more chances[1/3] to process the 
spec[index_parallel_listing-stats_ompohlkj_2020-07-19T00:47:06.225Z_4].
   2020-07-21T14:23:54,726 INFO [task-monitor-0] 
org.apache.druid.indexing.common.task.batch.parallel.TaskMonitor - Submitted a 
new 
task[partial_index_generic_merge_listing-stats_dginpfip_2020-07-21T14:23:54.532Z]
 for retrying 
spec[index_parallel_listing-stats_ompohlkj_2020-07-19T00:47:06.225Z_4]
   2020-07-21T14:23:55,066 WARN [task-monitor-0] 
org.apache.druid.indexing.common.task.batch.parallel.TaskMonitor - 
task[partial_index_generic_merge_listing-stats_emlcfcoj_2020-07-21T14:23:51.895Z]
 failed!
   2020-07-21T14:23:55,066 INFO [task-monitor-0] 
org.apache.druid.indexing.common.task.batch.parallel.TaskMonitor - We still 
have more chances[2/3] to process the 
spec[index_parallel_listing-stats_ompohlkj_2020-07-19T00:47:06.225Z_1].
   2020-07-21T14:23:55,235 INFO [task-monitor-0] 
org.apache.druid.indexing.common.task.batch.parallel.TaskMonitor - Submitted a 
new 
task[partial_index_generic_merge_listing-stats_omoemikj_2020-07-21T14:23:55.066Z]
 for retrying 
spec[index_parallel_listing-stats_ompohlkj_2020-07-19T00:47:06.225Z_1]
   2020-07-21T14:23:55,684 WARN [task-monitor-0] 
org.apache.druid.indexing.common.task.batch.parallel.TaskMonitor - 
task[partial_index_generic_merge_listing-stats_bpjmkhkj_2020-07-21T14:23:52.698Z]
 failed!
   2020-07-21T14:23:55,684 INFO [task-monitor-0] 
org.apache.druid.indexing.common.task.batch.parallel.TaskMonitor - We still 
have more chances[2/3] to process the 
spec[index_parallel_listing-stats_ompohlkj_2020-07-19T00:47:06.225Z_2].
   2020-07-21T14:23:55,853 INFO [task-monport=8091, useHttps=false, 
subTaskId='partial_range_index_generate_listing-stats_bpeglhfn_2020-07-20T07:43:00.320Z',
 interval=2019-03-01T00:00:00.000Z/2019-04-01T00:00:00.000Z, 
secondaryPartition=BuildingSingleDimensionShardSpec{bucketId=20, 
dimension='H3_HEX_RES_5', start='8530a127fffffff', end='85390ca3fffffff', 
partitionNum=3}}, PartitionLocation{host='ip-10-0-2-94.ec2.internal', 
port=8091, useHttps=false, 
subTaskId='partial_range_index_generate_listing-stats_cbpldlfd_2020-07-19T21:45:35.321Z',itor-0]
 org.apache.druid.indexing.common.task.batch.parallel.TaskMonitor - Submitted a 
new 
task[partial_index_generic_merge_listing-stats_difjpham_2020-07-21T14:23:55.684Z]
 for retrying 
spec[index_parallel_listing-stats_ompohlkj_2020-07-19T00:47:06.225Z_2]
   2020-07-21T14:23:56,186 WARN [task-monitor-0] 
org.apache.druid.indexing.common.task.batch.parallel.TaskMonitor - 
task[partial_index_generic_merge_listing-stats_hgedboii_2020-07-21T14:23:53.511Z]
 failed!
   2020-07-21T14:23:56,186 ERROR [task-monitor-0] 
org.apache.druid.indexing.common.task.batch.parallel.TaskMonitor - 
spec[index_parallel_listing-stats_ompohlkj_2020-07-19T00:47:06.225Z_0] failed 
after [3] tries
   2020-07-21T14:23:56,187 ERROR [task-runner-0-priority-0] 
org.apache.druid.indexing.common.task.batch.parallel.ParallelIndexPhaseRunner - 
Failed because of the failed sub 
task[partial_index_generic_merge_listing-stats_hgedboii_2020-07-21T14:23:53.511Z]
   2020-07-21T14:23:56,187 INFO [task-runner-0-priority-0] 
org.apache.druid.indexing.common.task.batch.parallel.ParallelIndexPhaseRunner - 
Cleaning up resources
   2020-07-21T14:23:56,187 INFO [task-runner-0-priority-0] 
org.apache.druid.indexing.common.task.batch.parallel.TaskMonitor - Request to 
cancel 
subtask[partial_index_generic_merge_listing-stats_omoemikj_2020-07-21T14:23:55.066Z]
   2020-07-21T14:23:56,188 ERROR [task-monitor-0] 
org.apache.druid.indexing.common.task.batch.parallel.TaskMonitor - Error while 
monitoring
   java.lang.RuntimeException: java.lang.InterruptedException
        at 
org.apache.druid.client.indexing.HttpIndexingServiceClient.getTaskStatus(HttpIndexingServiceClient.java:268)
 ~[druid-server-0.19.0.jar:0.19.0]
        at 
org.apache.druid.indexing.common.task.batch.parallel.TaskMonitor.lambda$start$0(TaskMonitor.java:128)
 ~[druid-indexing-service-0.19.0.jar:0.19.0]
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
[?:1.8.0_252]
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) 
[?:1.8.0_252]
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
 [?:1.8.0_252]
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
 [?:1.8.0_252]
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
[?:1.8.0_252]
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
[?:1.8.0_252]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_252]
   Caused by: java.lang.InterruptedException
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1302)
 ~[?:1.8.0_252]
        at 
com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:285)
 ~[guava-16.0.1.jar:?]
        at 
com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116) 
~[guava-16.0.1.jar:?]
        at 
org.apache.druid.discovery.DruidLeaderClient.go(DruidLeaderClient.java:173) 
~[druid-server-0.19.0.jar:0.19.0]
        at 
org.apache.druid.discovery.DruidLeaderClient.go(DruidLeaderClient.java:145) 
~[druid-server-0.19.0.jar:0.19.0]
        at 
org.apache.druid.client.indexing.HttpIndexingServiceClient.getTaskStatus(HttpIndexingServiceClient.java:253)
 ~[druid-server-0.19.0.jar:0.19.0]
        ... 8 more
   2020-07-21T14:23:56,347 INFO [task-runner-0-priority-0] 
org.apache.druid.indexing.common.task.batch.parallel.TaskMonitor - Request to 
cancel 
subtask[partial_index_generic_merge_listing-stats_difjpham_2020-07-21T14:23:55.684Z]
   2020-07-21T14:23:56,475 INFO [task-runner-0-priority-0] 
org.apache.druid.indexing.common.task.batch.parallel.TaskMonitor - Request to 
cancel 
subtask[partial_index_generic_merge_listing-stats_lcohkdeb_2020-07-21T14:23:53.087Z]
   2020-07-21T14:23:56,568 INFO [task-runner-0-priority-0] 
org.apache.druid.indexing.common.task.batch.parallel.TaskMonitor - Request to 
cancel 
subtask[partial_index_generic_merge_listing-stats_ibkdpjng_2020-07-21T14:23:54.024Z]
   2020-07-21T14:23:56,662 INFO [task-runner-0-priority-0] 
org.apache.druid.indexing.common.task.batch.parallel.TaskMonitor - Request to 
cancel 
subtask[partial_index_generic_merge_listing-stats_dginpfip_2020-07-21T14:23:54.532Z]
   2020-07-21T14:23:56,758 INFO [task-runner-0-priority-0] 
org.apache.druid.indexing.common.task.batch.parallel.TaskMonitor - Stopped 
taskMonitor
   2020-07-21T14:23:56,762 INFO [task-runner-0-priority-0] 
org.apache.druid.indexing.worker.executor.ExecutorLifecycle - Task completed 
with status: {
     "id" : "index_parallel_listing-stats_ompohlkj_2020-07-19T00:47:06.225Z",
     "status" : "FAILED",
     "duration" : 221801819,
     "errorMsg" : null,
     "location" : {
       "host" : null,
       "port" : -1,
       "tlsPort" : -1
     }
   }
   2020-07-21T14:23:56,770 INFO [main] 
org.apache.druid.java.util.common.lifecycle.Lifecycle - Stopping lifecycle 
[module] stage [ANNOUNCEMENTS]
   2020-07-21T14:23:56,774 INFO [main] 
org.apache.druid.java.util.common.lifecycle.Lifecycle - Stopping lifecycle 
[module] stage [SERVER]
   2020-07-21T14:23:56,778 INFO [main] 
org.eclipse.jetty.server.AbstractConnector - Stopped 
ServerConnector@bb50de6{HTTP/1.1,[http/1.1]}{0.0.0.0:8100}
   2020-07-21T14:23:56,778 INFO [main] org.eclipse.jetty.server.session - node0 
Stopped scavenging
   2020-07-21T14:23:56,780 INFO [main] 
org.eclipse.jetty.server.handler.ContextHandler - Stopped 
o.e.j.s.ServletContextHandler@2a54c92e{/,null,UNAVAILABLE}
   2020-07-21T14:23:56,786 INFO [main] 
org.apache.druid.java.util.common.lifecycle.Lifecycle - Stopping lifecycle 
[module] stage [NORMAL]
   2020-07-21T14:23:56,787 INFO [main] 
org.apache.druid.indexing.overlord.SingleTaskBackgroundRunner - Starting 
graceful shutdown of 
task[index_parallel_listing-stats_ompohlkj_2020-07-19T00:47:06.225Z].
   2020-07-21T14:23:56,787 INFO [main] 
org.apache.druid.indexing.common.task.batch.parallel.ParallelIndexPhaseRunner - 
Cleaning up resources
   2020-07-21T14:23:56,791 INFO [main] 
org.apache.druid.security.basic.authentication.db.cache.CoordinatorPollingBasicAuthenticatorCacheManager
 - CoordinatorPollingBasicAuthenticatorCacheManager is stopping.
   2020-07-21T14:23:56,791 INFO [main] 
org.apache.druid.security.basic.authentication.db.cache.CoordinatorPollingBasicAuthenticatorCacheManager
 - CoordinatorPollingBasicAuthenticatorCacheManager is stopped.
   2020-07-21T14:23:56,791 INFO [main] 
org.apache.druid.security.basic.authorization.db.cache.CoordinatorPollingBasicAuthorizerCacheManager
 - CoordinatorPollingBasicAuthorizerCacheManager is stopping.
   2020-07-21T14:23:56,791 INFO [main] 
org.apache.druid.security.basic.authorization.db.cache.CoordinatorPollingBasicAuthorizerCacheManager
 - CoordinatorPollingBasicAuthorizerCacheManager is stopped.
   2020-07-21T14:23:56,794 INFO 
[LookupExtractorFactoryContainerProvider-MainThread] 
org.apache.druid.query.lookup.LookupReferencesManager - Lookup Management loop 
exited. Lookup notices are not handled anymore.
   2020-07-21T14:23:56,829 INFO [Curator-Framework-0] 
org.apache.curator.framework.imps.CuratorFrameworkImpl - 
backgroundOperationsLoop exiting
   2020-07-21T14:23:56,834 INFO [main] org.apache.zookeeper.ZooKeeper - 
Session: 0x100000025e001c7 closed
   2020-07-21T14:23:56,834 INFO [main] 
org.apache.druid.java.util.common.lifecycle.Lifecycle - Stopping lifecycle 
[module] stage [INIT]
   Finished peon task
   ```
   ### Cluster Size
   - Master server
      - m5.large instance: 2 vCPU, 8 GiB Memory
   - Data servers
      - 2 i3.xlarge instances: 4 vCPU, 30.5 GiB Memory, 1 x 0.95TB SSD (each)
   - Query server
      - m5.large instace: 2 vCPU, 8GiB Memory
   
   ### Configurations in use
   
   ```
   druid.indexer.runner.javaOpts=-server -Xms4g -Xmx4g 
-XX:MaxDirectMemorySize=4g -Duser.timezone=UTC -Dfile.encoding=UTF-8 
-XX:+UseG1GC -XX:+ExitOnOutOfMemoryError -XX:+HeapDumpOnOutOfMemoryError 
-XX:HeapDumpPath=/home/ec2-user/apache-druid-0.18.1/var/heap_dumps/druid-heap.hprof
 -Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager
   
   ```
   
   ### IngestionSpec
   ```json
   {
     "type": "index_parallel",
     "spec": {
       "ioConfig": {
         "type": "index_parallel",
         "inputSource": {
           "type": "s3",
           "prefixes": [
             "s3://atr-druid-data-sources/listing-stats/"
           ]
         },
         "inputFormat": {
           "type": "csv",
           "findColumnsFromHeader": true,
           "listDelimiter": "|"
         },
         "appendToExisting": false
       },
       "tuningConfig": {
         "type": "index_parallel",
         "partitionsSpec": {
           "type": "single_dim",
           "partitionDimension": "H3_HEX_RES_5",
           "targetRowsPerSegment": 5500000
         },
         "forceGuaranteedRollup": true,
         "logParseExceptions": true,
         "maxNumConcurrentSubTasks": 2,
       "dataSchema": {
         "dataSource": "listing-stats",
         "granularitySpec": {
           "type": "uniform",
           "queryGranularity": "DAY",
           "rollup": true,
           "intervals": [
               "2018-05-01T00:00:00.000Z/2020-06-01T00:00:00.000Z"
           ],
           "segmentGranularity": "DAY"
         },
         "timestampSpec": {
           "column": "CALENDAR_DATE",
           "format": "auto"
         },
         "dimensionsSpec": {
           "dimensions": [
             {
               "type": "string",
               "name": "ARRANGEMENT_TYPE"
             },
             {
               "type": "string",
               "name": "H3_HEX_RES_5"
             },
             {
               "type": "long",
               "name": "BATHROOMS"
             },
             {
               "type": "long",
               "name": "INSTANT_BOOK"
             },
             {
               "type": "long",
               "name": "IS_HOTEL"
             },
             {
               "type": "long",
               "name": "IS_MANAGED"
             },
             {
               "type": "string",
               "name": "UID"
             },
             {
               "type": "long",
               "name": "SLEEPS"
             },
             {
               "type": "long",
               "name": "ROOMS"
             },
             {
               "type": "long",
               "name": "NUMBER_OF_REVIEWS"
             },
             {
               "type": "long",
               "name": "RATING"
             }
           ]
         },
         "metricsSpec": [
           {
             "name": "sum_BLOCKED",
             "type": "longSum",
             "fieldName": "BLOCKED"
           },
           {
             "name": "sum_BOOKED",
             "type": "longSum",
             "fieldName": "BOOKED"
           },
           {
             "name": "sum_LEAD_TIME_PROP",
             "type": "longSum",
             "fieldName": "LEAD_TIME_PROP"
           },
           {
             "name": "sum_LISTED_NIGHT_RATE",
             "type": "longSum",
             "fieldName": "LISTED_NIGHT_RATE"
           },
           {
             "name": "sum_BOOKING_PROP",
             "type": "floatSum",
             "fieldName": "BOOKING_PROP"
           },
           {
             "name": "sum_BOOKED_NIGHT_RATE",
             "type": "longSum",
             "fieldName": "BOOKED_NIGHT_RATE"
           },
           {
             "name": "sum_ACTIVE",
             "type": "longSum",
             "fieldName": "ACTIVE"
           }
         ]
       }
     }
   }
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to