[ 
https://issues.apache.org/jira/browse/IGNITE-4097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15589204#comment-15589204
 ] 

Ivan Veselovsky edited comment on IGNITE-4097 at 10/20/16 12:55 PM:
--------------------------------------------------------------------

Reasons to get rid of offheap collections in favor of on-heap collections:
1) To use RawComparator to compare objects we need two byte[] arrays , but it 
is expensive to fetch them from offheap memory. But in case of on-heap 
collection byte[] arrays are always handy.
2) SkipList datastructure efficiency is O(N) in worst case, it may make sense 
to use collection with guaranteed log(N) worst case efficiency. 

Simple Map-like collection has more simple interface, like just put(K,V) . Now 
{code}HadoopMultimap{code} has complex interface part related to serialized 
data reading (inner interfaces Adder, Key, Value). 
 


was (Author: iveselovskiy):
To use RawComparator to compare objects we need two byte[] arrays , but it is 
expensive to fetch them from offheap memory. So, the question raises, if we 
should re-implement the sorting collection to an on-heap solution to use 
RawComparator effectively. 

> Spilled map-reduce: map side.
> -----------------------------
>
>                 Key: IGNITE-4097
>                 URL: https://issues.apache.org/jira/browse/IGNITE-4097
>             Project: Ignite
>          Issue Type: Sub-task
>          Components: hadoop
>    Affects Versions: 1.6
>            Reporter: Ivan Veselovsky
>            Assignee: Ivan Veselovsky
>             Fix For: 1.9
>
>
> Implement spilled output on Map side of map-reduce.
> In general, algorithm should follow the one used in Hadoop. The difference on 
> the Map side is that 
> 1) we use sorting collection (Hadoop sorts a range of map outputs explicitly);
> 2) we store the map output in files not using FileSystem , but rather local 
> files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to