Ivan Veselovsky commented on IGNITE-4037:

Additions to the suggested solution plan. Difference with Hadoop shuffle 
1) push of merged map results instead of Hadoop pull mechanism (TBD);
2) using ad-hoc temp files (possibly accompanied with mapped memory buffers) 
instead of files created with FileSystem .
3) storing map outputs in a sorted memory buffer instead of "store -> sort -> 
spill" logic used in Hadoop.

> High memory consumption when executing TeraSort Hadoop example
> --------------------------------------------------------------
>                 Key: IGNITE-4037
>                 URL: https://issues.apache.org/jira/browse/IGNITE-4037
>             Project: Ignite
>          Issue Type: Bug
>    Affects Versions: 1.6
>            Reporter: Ivan Veselovsky
>            Assignee: Ivan Veselovsky
>             Fix For: 1.7
> When executing TeraSort Hadoop example, we observe high memory consumption 
> that frequently leads to cluster malfunction.
> The problem can be reproduced in unit test, even with 1 node, and with not 
> huge input data set as 100Mb. 
> Dump analysis shows that  memory is taken in various queues: 
> org.apache.ignite.internal.processors.hadoop.taskexecutor.HadoopExecutorService#queue
> and 
> task queue of 
> org.apache.ignite.internal.processors.hadoop.jobtracker.HadoopJobTracker#evtProcSvc
>   .
> Since objects stored in these queues hold byte arrays of significant size, 
> memory if consumed very fast.
> It looks like real cause of the problem is that some tasks are blocked.

This message was sent by Atlassian JIRA

Reply via email to