stepanovD opened a new issue #10384:
URL: https://github.com/apache/druid/issues/10384
Hello! I try ingest dataset from hadoop (about 200 Gb). After map task was
completed job was failed with OutOfMemory exception.
My tuning config:
```
"tuningConfig": {
"type": "hadoop",
"jobProperties": {
"mapreduce.job.classloader": "true",
"mapreduce.job.classloader.system.classes":
"-com.google.,org.apache.druid.,org.apache.hadoop.",
"mapreduce.map.memory.mb": "2048",
"mapreduce.map.java.opts": "-server -Xmx2048m",
"mapreduce.reduce.memory.mb": "4096",
"mapreduce.reduce.java.opts": "-server -Xmx4G",
"mapreduce.job.cache.limit.max-resources-mb":
"1024",
"mapreduce.job.cache.limit.max-single-resource-mb": "1024",
"mapreduce.input.fileinputformat.split.minsize": "125829120",
"mapreduce.input.fileinputformat.split.maxsize": "1073741824",
"mapreduce.job.counters.max": "10"
}
}
```
Tail of log:
> 2020-09-11T13:05:09,078 INFO [LocalJobRunner Map Task Executor #0]
org.apache.hadoop.mapred.LocalJobRunner - Finishing task:
attempt_local1877739967_0002_m_001471_0
> 2020-09-11T13:05:09,078 INFO [Thread-4104]
org.apache.hadoop.mapred.LocalJobRunner - map task executor complete.
> 2020-09-11T13:05:09,086 INFO [Thread-4104]
org.apache.hadoop.mapred.LocalJobRunner - Waiting for reduce tasks
> 2020-09-11T13:05:09,086 INFO [pool-38-thread-1]
org.apache.hadoop.mapred.LocalJobRunner - Starting task:
attempt_local1877739967_0002_r_000000_0
> 2020-09-11T13:05:09,091 INFO [pool-38-thread-1]
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter - File Output
Committer Algorithm version is 1
> 2020-09-11T13:05:09,091 INFO [pool-38-thread-1]
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter -
FileOutputCommitter skip cleanup _temporary folders under output dir
> ectory:false, ignore cleanup failures: false
> 2020-09-11T13:05:09,091 INFO [pool-38-thread-1]
org.apache.hadoop.mapred.Task - Using ResourceCalculatorProcessTree : [ ]
> 2020-09-11T13:05:09,091 INFO [pool-38-thread-1]
org.apache.hadoop.mapred.ReduceTask - Using ShuffleConsumerPlugin:
org.apache.hadoop.mapreduce.task.reduce.Shuffle@5df9e290
> 2020-09-11T13:05:09,092 INFO [pool-38-thread-1]
org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl - MergerManager:
memoryLimit=751252288, maxSingleShuffleLimit=187813072,
> mergeThreshold=495826528, ioSortFactor=10,
memToMemMergeOutputsThreshold=10
> 2020-09-11T13:05:09,092 INFO [EventFetcher for fetching Map Completion
Events] org.apache.hadoop.mapreduce.task.reduce.EventFetcher -
attempt_local1877739967_0002_r_000000_0 Thr
> ead started: EventFetcher for fetching Map Completion Events
>
...
> 2020-09-11T13:05:23,786 INFO [localfetcher#5]
org.apache.hadoop.mapreduce.task.reduce.LocalFetcher - localfetcher#5 about to
shuffle output of map attempt_local1877739967_0002_m_000953_0 decomp: 1477886
len: 1477890 to MEMORY
> 2020-09-11T13:05:23,800 INFO [localfetcher#5]
org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput - Read 1477886 bytes
from map-output for attempt_local1877739967_0002_m_000953_0
> 2020-09-11T13:05:23,800 INFO [localfetcher#5]
org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl - closeInMemoryFile ->
map-output of size: 1477886, inMemoryMapOutputs.size() -> 141, commitMemory ->
206933260, usedMemory ->704723960
> Terminating due to java.lang.OutOfMemoryError: Java heap space
How right fix heap config?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]