Ivan Veselovsky commented on IGNITE-4037:

My learning of Hadoop have led to the following solution plan:
1) We may continue to use HadoopSkipList as now, or another sorted collection, 
just should limit it in size. That is , we may store in memory not more than M 
2) The data are to be partitioned by R parts , as that done now.
3) If map output exceeds 0.8 * M (the coefficient is to be configurable, 
similar to io.sort.spill.percent Hadoop property ), we freeze the memory 
colection and start spill it to disk. Meanwhile the writing is continued in 
another collection. This way amount of in-memory data should not exceed the 
configured limit, and on disk we will have a number of sorted files for each 
4) The spilled files need to be sent to reduce side and merged. Likely they 
need to be fully merged on reduce side, then sent.
5) Similarly, output obtained from different meppers is to be merged on reduce 
side. This functionelity can be borrowed from Hadoop implementation, with the 
difference that Ignite has "push" map -> reduce sending implementation, while 
Hadoop uses rather "pool" implementation.

> High memory consumption when executing TeraSort Hadoop example
> --------------------------------------------------------------
>                 Key: IGNITE-4037
>                 URL: https://issues.apache.org/jira/browse/IGNITE-4037
>             Project: Ignite
>          Issue Type: Bug
>    Affects Versions: 1.6
>            Reporter: Ivan Veselovsky
>            Assignee: Ivan Veselovsky
>             Fix For: 1.7
> When executing TeraSort Hadoop example, we observe high memory consumption 
> that frequently leads to cluster malfunction.
> The problem can be reproduced in unit test, even with 1 node, and with not 
> huge input data set as 100Mb. 
> Dump analysis shows that  memory is taken in various queues: 
> org.apache.ignite.internal.processors.hadoop.taskexecutor.HadoopExecutorService#queue
> and 
> task queue of 
> org.apache.ignite.internal.processors.hadoop.jobtracker.HadoopJobTracker#evtProcSvc
>   .
> Since objects stored in these queues hold byte arrays of significant size, 
> memory if consumed very fast.
> It looks like real cause of the problem is that some tasks are blocked.

This message was sent by Atlassian JIRA

Reply via email to