stepanovD opened a new issue #10384:
URL: https://github.com/apache/druid/issues/10384


   Hello! I try ingest dataset from hadoop (about 200 Gb). After map task was 
completed job was failed with OutOfMemory exception.
   My tuning config:
   
   ```
                   "tuningConfig": {
                        "type": "hadoop",
                        "jobProperties": {
                                "mapreduce.job.classloader": "true",
                                "mapreduce.job.classloader.system.classes": 
"-com.google.,org.apache.druid.,org.apache.hadoop.",
                                "mapreduce.map.memory.mb": "2048",
                                "mapreduce.map.java.opts": "-server -Xmx2048m",
                                "mapreduce.reduce.memory.mb": "4096",
                                "mapreduce.reduce.java.opts": "-server -Xmx4G",
                                "mapreduce.job.cache.limit.max-resources-mb": 
"1024",
                                
"mapreduce.job.cache.limit.max-single-resource-mb": "1024",
                                
"mapreduce.input.fileinputformat.split.minsize": "125829120",
                                
"mapreduce.input.fileinputformat.split.maxsize": "1073741824",
                                "mapreduce.job.counters.max": "10"
                        }
                }
   ```
   
   
   Tail of log:
   
   > 2020-09-11T13:05:09,078 INFO [LocalJobRunner Map Task Executor #0] 
org.apache.hadoop.mapred.LocalJobRunner - Finishing task: 
attempt_local1877739967_0002_m_001471_0
   > 2020-09-11T13:05:09,078 INFO [Thread-4104] 
org.apache.hadoop.mapred.LocalJobRunner - map task executor complete.
   > 2020-09-11T13:05:09,086 INFO [Thread-4104] 
org.apache.hadoop.mapred.LocalJobRunner - Waiting for reduce tasks
   > 2020-09-11T13:05:09,086 INFO [pool-38-thread-1] 
org.apache.hadoop.mapred.LocalJobRunner - Starting task: 
attempt_local1877739967_0002_r_000000_0
   > 2020-09-11T13:05:09,091 INFO [pool-38-thread-1] 
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter - File Output 
Committer Algorithm version is 1
   > 2020-09-11T13:05:09,091 INFO [pool-38-thread-1] 
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter - 
FileOutputCommitter skip cleanup _temporary folders under output dir
   > ectory:false, ignore cleanup failures: false
   > 2020-09-11T13:05:09,091 INFO [pool-38-thread-1] 
org.apache.hadoop.mapred.Task -  Using ResourceCalculatorProcessTree : [ ]
   > 2020-09-11T13:05:09,091 INFO [pool-38-thread-1] 
org.apache.hadoop.mapred.ReduceTask - Using ShuffleConsumerPlugin: 
org.apache.hadoop.mapreduce.task.reduce.Shuffle@5df9e290
   > 2020-09-11T13:05:09,092 INFO [pool-38-thread-1] 
org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl - MergerManager: 
memoryLimit=751252288, maxSingleShuffleLimit=187813072,
   >  mergeThreshold=495826528, ioSortFactor=10, 
memToMemMergeOutputsThreshold=10
   > 2020-09-11T13:05:09,092 INFO [EventFetcher for fetching Map Completion 
Events] org.apache.hadoop.mapreduce.task.reduce.EventFetcher - 
attempt_local1877739967_0002_r_000000_0 Thr
   > ead started: EventFetcher for fetching Map Completion Events
   > 
   ...
   > 2020-09-11T13:05:23,786 INFO [localfetcher#5] 
org.apache.hadoop.mapreduce.task.reduce.LocalFetcher - localfetcher#5 about to 
shuffle output of map attempt_local1877739967_0002_m_000953_0 decomp: 1477886 
len: 1477890 to MEMORY
   > 2020-09-11T13:05:23,800 INFO [localfetcher#5] 
org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput - Read 1477886 bytes 
from map-output for attempt_local1877739967_0002_m_000953_0
   > 2020-09-11T13:05:23,800 INFO [localfetcher#5] 
org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl - closeInMemoryFile -> 
map-output of size: 1477886, inMemoryMapOutputs.size() -> 141, commitMemory -> 
206933260, usedMemory ->704723960
   > Terminating due to java.lang.OutOfMemoryError: Java heap space
   
   How right fix heap config?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to