tarekabouzeid opened a new issue #11384:
URL: https://github.com/apache/druid/issues/11384
### Affected Version
0.21.1
### Description
We were ingesting parquet files from Minio using S3 prefix and pointing to a
bucket. Druid gave an estimate number of tasks 998
"estimatedNumSucceededTasks[998]". Then after reading 1000 files at task [125]
an error happened in Minio logs as below
"
API: ListObjectsV2
Time: 23:28:23:0
DeploymentID: 15313dfa-ae71-4b9d-a474-873786ed2b28
RequestID: 168B59DD68270F74
RemoteHost: xxx.xxx.xxx.xxx
Host: xxxx.xxxx.xxx.xxxx:9000
UserAgent: aws-sdk-java/1.11.199 Linux/3.10.0-1062.7.1.el7.x86_64
OpenJDK_64-Bit_Server_VM/25.292-b10 java/1.8.0_292
Error: remote listing canceled: file not found (*fmt.wrapError)
cmd/metacache-set.go:541:cmd.(*erasureObjects).listPath()
cmd/metacache-server-pool.go:174:cmd.(*erasureServerPools).listPath.func1()
"
But the main index_parallel task didnot detect that error and it just marked
the whole ingestion pipeline as success and didnot load the entire dataset from
Minio
**Log snippet from the ingestion task** :
"
2021-06-23T23:29:45,090 DEBUG [qtp1954745715-173]
org.apache.druid.jetty.RequestLog - xxx.xxx.xxx.xxx POST
//myhost:8100/druid/worker/v1/chat/index_parallel_sparkpublic_trial_1_gdbmhfbg_2021-06-23T21%3A46%3A23.677Z/report
HTTP/1.1
2021-06-23T23:29:48,042 INFO [task-monitor-0]
org.apache.druid.indexing.common.task.batch.parallel.TaskMonitor - [122/998]
tasks succeeded
2021-06-23T23:30:17,535 DEBUG [qtp1954745715-126]
org.apache.druid.jetty.RequestLog - xxx.xxx.xxx.xxx POST
//myhost:8100/druid/worker/v1/chat/index_parallel_sparkpublic_trial_1_gdbmhfbg_2021-06-23T21%3A46%3A23.677Z/report
HTTP/1.1
2021-06-23T23:30:21,035 INFO [task-monitor-0]
org.apache.druid.indexing.common.task.batch.parallel.TaskMonitor - [123/998]
tasks succeeded
2021-06-23T23:31:22,055 DEBUG [qtp1954745715-130]
org.apache.druid.jetty.RequestLog - xxx.xxx.xxx.xxx POST
//myhost:8100/druid/worker/v1/chat/index_parallel_sparkpublic_trial_1_gdbmhfbg_2021-06-23T21%3A46%3A23.677Z/report
HTTP/1.1
2021-06-23T23:31:23,035 INFO [task-monitor-0]
org.apache.druid.indexing.common.task.batch.parallel.TaskMonitor - [124/998]
tasks succeeded
2021-06-23T23:31:34,953 DEBUG [qtp1954745715-144]
org.apache.druid.jetty.RequestLog - xxx.xxx.xxx.xxx POST
//myhost:8100/druid/worker/v1/chat/index_parallel_sparkpublic_trial_1_gdbmhfbg_2021-06-23T21%3A46%3A23.677Z/report
HTTP/1.1
2021-06-23T23:31:37,035 INFO [task-monitor-0]
org.apache.druid.indexing.common.task.batch.parallel.TaskMonitor - [125/998]
tasks succeeded
2021-06-23T23:31:37,036 INFO [task-runner-0-priority-0]
org.apache.druid.indexing.common.task.batch.parallel.ParallelIndexPhaseRunner -
Cleaning up resources
2021-06-23T23:31:37,036 INFO [task-runner-0-priority-0]
org.apache.druid.indexing.common.task.batch.parallel.TaskMonitor - Stopped
taskMonitor
2021-06-23T23:31:37,787 INFO [task-runner-0-priority-0]
org.apache.druid.indexing.common.task.batch.parallel.ParallelIndexSupervisorTask
- Published [372] segments
2021-06-23T23:31:37,792 INFO [task-runner-0-priority-0]
org.apache.druid.indexing.worker.executor.ExecutorLifecycle - Task completed
with status: {
"id" :
"index_parallel_sparkpublic_trial_1_gdbmhfbg_2021-06-23T21:46:23.677Z",
"status" : "SUCCESS",
"duration" : 6303374,
"errorMsg" : null,
"location" : {
"host" : null,
"port" : -1,
"tlsPort" : -1
}
}
2021-06-23T23:31:37,802 INFO [main]
org.apache.druid.java.util.common.lifecycle.Lifecycle - Stopping lifecycle
[module] stage [ANNOUNCEMENTS]
2021-06-23T23:31:37,805 INFO [main]
org.apache.druid.java.util.common.lifecycle.Lifecycle - Stopping lifecycle
[module] stage [SERVER]
2021-06-23T23:31:37,815 INFO [main]
org.eclipse.jetty.server.AbstractConnector - Stopped
ServerConnector@7cf66cf9{HTTP/1.1, (http/1.1)}{0.0.0.0:8100}
2021-06-23T23:31:37,815 INFO [main] org.eclipse.jetty.server.session - node0
Stopped scavenging
2021-06-23T23:31:37,818 INFO [main]
org.eclipse.jetty.server.handler.ContextHandler - Stopped
o.e.j.s.ServletContextHandler@4dffa400{/,null,STOPPED}
2021-06-23T23:31:37,830 INFO [main]
org.apache.druid.java.util.common.lifecycle.Lifecycle - Stopping lifecycle
[module] stage [NORMAL]
2021-06-23T23:31:37,831 INFO [main]
org.apache.druid.indexing.overlord.SingleTaskBackgroundRunner - Starting
graceful shutdown of
task[index_parallel_sparkpublic_trial_1_gdbmhfbg_2021-06-23T21:46:23.677Z].
2021-06-23T23:31:37,832 INFO [main]
org.apache.druid.indexing.common.task.batch.parallel.ParallelIndexPhaseRunner -
Cleaning up resources
2021-06-23T23:31:37,872 INFO
[LookupExtractorFactoryContainerProvider-MainThread]
org.apache.druid.query.lookup.LookupReferencesManager - Lookup Management loop
exited. Lookup notices are not handled anymore.
2021-06-23T23:31:37,872 INFO [main]
org.apache.druid.security.basic.authorization.db.cache.CoordinatorPollingBasicAuthorizerCacheManager
- CoordinatorPollingBasicAuthorizerCacheManager is stopping.
2021-06-23T23:31:37,873 INFO [main]
org.apache.druid.security.basic.authorization.db.cache.CoordinatorPollingBasicAuthorizerCacheManager
- CoordinatorPollingBasicAuthorizerCacheManager is stopped.
2021-06-23T23:31:37,873 INFO [main]
org.apache.druid.security.basic.authentication.db.cache.CoordinatorPollingBasicAuthenticatorCacheManager
- CoordinatorPollingBasicAuthenticatorCacheManager is stopping.
2021-06-23T23:31:37,873 INFO [main]
org.apache.druid.security.basic.authentication.db.cache.CoordinatorPollingBasicAuthenticatorCacheManager
- CoordinatorPollingBasicAuthenticatorCacheManager is stopped.
2021-06-23T23:31:37,875 INFO [Curator-Framework-0]
org.apache.curator.framework.imps.CuratorFrameworkImpl -
backgroundOperationsLoop exiting
2021-06-23T23:31:37,879 INFO [main] org.apache.zookeeper.ZooKeeper -
Session: 0x300bcb2cd840004 closed
2021-06-23T23:31:37,879 INFO [main-EventThread]
org.apache.zookeeper.ClientCnxn - EventThread shut down for session:
0x300bcb2cd840004
2021-06-23T23:31:37,912 INFO [main]
org.apache.druid.java.util.common.lifecycle.Lifecycle$CloseableHandler -
Closing object[org.asynchttpclient.DefaultAsyncHttpClient@66fff42f]
2021-06-23T23:31:37,913 INFO [main]
org.apache.druid.java.util.common.lifecycle.Lifecycle - Stopping lifecycle
[module] stage [INIT]
"
**Total Number of files in the bucket** : 7979 files
**Each file size** : 123 MB
**below is the spec used**
"
{
"type": "index_parallel",
"id":
"index_parallel_sparkpublic_trial_1_gdbmhfbg_2021-06-23T21:46:23.677Z",
"groupId":
"index_parallel_sparkpublic_trial_1_gdbmhfbg_2021-06-23T21:46:23.677Z",
"resource": {
"availabilityGroup":
"index_parallel_sparkpublic_trial_1_gdbmhfbg_2021-06-23T21:46:23.677Z",
"requiredCapacity": 1
},
"spec": {
"dataSchema": {
"dataSource": "sparkpublic_trial_1",
"timestampSpec": {
"column": "timestamp_col",
"format": "posix",
"missingValue": "2020-01-01T00:00:00.000Z"
},
"dimensionsSpec": {
"dimensions": [
{
"type": "long",
"name": "c1",
"multiValueHandling": "SORTED_ARRAY",
"createBitmapIndex": false
},
{
"type": "long",
"name": "c2",
"multiValueHandling": "SORTED_ARRAY",
"createBitmapIndex": false
},
{
"type": "long",
"name": "c3",
"multiValueHandling": "SORTED_ARRAY",
"createBitmapIndex": false
},
{
"type": "long",
"name": "c4",
"multiValueHandling": "SORTED_ARRAY",
"createBitmapIndex": false
},
{
"type": "long",
"name": "c5",
"multiValueHandling": "SORTED_ARRAY",
"createBitmapIndex": false
},
{
"type": "long",
"name": "c6",
"multiValueHandling": "SORTED_ARRAY",
"createBitmapIndex": false
},
{
"type": "long",
"name": "c7",
"multiValueHandling": "SORTED_ARRAY",
"createBitmapIndex": false
},
{
"type": "long",
"name": "c8",
"multiValueHandling": "SORTED_ARRAY",
"createBitmapIndex": false
},
{
"type": "long",
"name": "c9",
"multiValueHandling": "SORTED_ARRAY",
"createBitmapIndex": false
},
{
"type": "string",
"name": "c10",
"multiValueHandling": "SORTED_ARRAY",
"createBitmapIndex": true
},
{
"type": "string",
"name": "c11",
"multiValueHandling": "SORTED_ARRAY",
"createBitmapIndex": true
},
{
"type": "string",
"name": "c12",
"multiValueHandling": "SORTED_ARRAY",
"createBitmapIndex": true
},
{
"type": "string",
"name": "c13",
"multiValueHandling": "SORTED_ARRAY",
"createBitmapIndex": true
},
{
"type": "string",
"name": "c14",
"multiValueHandling": "SORTED_ARRAY",
"createBitmapIndex": true
},
{
"type": "string",
"name": "c15",
"multiValueHandling": "SORTED_ARRAY",
"createBitmapIndex": true
},
{
"type": "string",
"name": "c16",
"multiValueHandling": "SORTED_ARRAY",
"createBitmapIndex": true
},
{
"type": "string",
"name": "c17",
"multiValueHandling": "SORTED_ARRAY",
"createBitmapIndex": true
},
{
"type": "string",
"name": "city",
"multiValueHandling": "SORTED_ARRAY",
"createBitmapIndex": true
},
{
"type": "string",
"name": "country",
"multiValueHandling": "SORTED_ARRAY",
"createBitmapIndex": true
},
{
"type": "long",
"name": "id",
"multiValueHandling": "SORTED_ARRAY",
"createBitmapIndex": false
}
],
"dimensionExclusions": [
"timestamp_col"
]
},
"metricsSpec": [],
"granularitySpec": {
"type": "uniform",
"segmentGranularity": "HOUR",
"queryGranularity": {
"type": "all"
},
"rollup": false,
"intervals": null
},
"transformSpec": {
"filter": null,
"transforms": []
}
},
"ioConfig": {
"type": "index_parallel",
"inputSource": {
"type": "s3",
"uris": null,
"prefixes": [
"s3://sparkpublic"
],
"objects": null,
"properties": null
},
"inputFormat": {
"type": "parquet",
"flattenSpec": {
"useFieldDiscovery": true,
"fields": []
},
"binaryAsString": false
},
"appendToExisting": false
},
"tuningConfig": {
"type": "index_parallel",
"maxRowsPerSegment": 5000000,
"appendableIndexSpec": {
"type": "onheap"
},
"maxRowsInMemory": 1000000,
"maxBytesInMemory": 0,
"maxTotalRows": null,
"numShards": null,
"splitHintSpec": null,
"partitionsSpec": {
"type": "dynamic",
"maxRowsPerSegment": 5000000,
"maxTotalRows": null
},
"indexSpec": {
"bitmap": {
"type": "roaring",
"compressRunOnSerialization": true
},
"dimensionCompression": "lz4",
"metricCompression": "lz4",
"longEncoding": "longs",
"segmentLoader": null
},
"indexSpecForIntermediatePersists": {
"bitmap": {
"type": "roaring",
"compressRunOnSerialization": true
},
"dimensionCompression": "lz4",
"metricCompression": "lz4",
"longEncoding": "longs",
"segmentLoader": null
},
"maxPendingPersists": 0,
"forceGuaranteedRollup": false,
"reportParseExceptions": false,
"pushTimeout": 0,
"segmentWriteOutMediumFactory": null,
"maxNumConcurrentSubTasks": 5,
"maxRetry": 3,
"taskStatusCheckPeriodMs": 1000,
"chatHandlerTimeout": "PT10S",
"chatHandlerNumRetries": 5,
"maxNumSegmentsToMerge": 100,
"totalNumMergeTasks": 10,
"logParseExceptions": false,
"maxParseExceptions": 2147483647,
"maxSavedParseExceptions": 0,
"maxColumnsToMerge": -1,
"buildV9Directly": true,
"partitionDimensions": []
}
},
"context": {
"forceTimeChunkLock": true
},
"dataSource": "sparkpublic_trial_1"
}
"
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]