[ https://issues.apache.org/jira/browse/IGNITE-4037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15576038#comment-15576038 ]
Ivan Veselovsky commented on IGNITE-4037: ----------------------------------------- My learning of Hadoop have led to the following solution plan: 1) We may continue to use HadoopSkipList as now, or another sorted collection, just should limit it in size. That is , we may store in memory not more than M data. 2) The data are to be partitioned by R parts , as that done now. 3) If map output exceeds 0.8 * M (the coefficient is to be configurable, similar to io.sort.spill.percent Hadoop property ), we freeze the memory colection and start spill it to disk. Meanwhile the writing is continued in another collection. This way amount of in-memory data should not exceed the configured limit, and on disk we will have a number of sorted files for each partition. 4) The spilled files need to be sent to reduce side and merged. Likely they need to be fully merged on reduce side, then sent. 5) Similarly, output obtained from different meppers is to be merged on reduce side. This functionelity can be borrowed from Hadoop implementation, with the difference that Ignite has "push" map -> reduce sending implementation, while Hadoop uses rather "pool" implementation. > High memory consumption when executing TeraSort Hadoop example > -------------------------------------------------------------- > > Key: IGNITE-4037 > URL: https://issues.apache.org/jira/browse/IGNITE-4037 > Project: Ignite > Issue Type: Bug > Affects Versions: 1.6 > Reporter: Ivan Veselovsky > Assignee: Ivan Veselovsky > Fix For: 1.7 > > > When executing TeraSort Hadoop example, we observe high memory consumption > that frequently leads to cluster malfunction. > The problem can be reproduced in unit test, even with 1 node, and with not > huge input data set as 100Mb. > Dump analysis shows that memory is taken in various queues: > org.apache.ignite.internal.processors.hadoop.taskexecutor.HadoopExecutorService#queue > > and > task queue of > org.apache.ignite.internal.processors.hadoop.jobtracker.HadoopJobTracker#evtProcSvc > . > Since objects stored in these queues hold byte arrays of significant size, > memory if consumed very fast. > It looks like real cause of the problem is that some tasks are blocked. -- This message was sent by Atlassian JIRA (v6.3.4#6332)