[
https://issues.apache.org/jira/browse/IGNITE-4097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15589204#comment-15589204
]
Ivan Veselovsky edited comment on IGNITE-4097 at 10/20/16 12:55 PM:
--------------------------------------------------------------------
Reasons to get rid of offheap collections in favor of on-heap collections:
1) To use RawComparator to compare objects we need two byte[] arrays , but it
is expensive to fetch them from offheap memory. But in case of on-heap
collection byte[] arrays are always handy.
2) SkipList datastructure efficiency is O(N) in worst case, it may make sense
to use collection with guaranteed log(N) worst case efficiency.
Simple Map-like collection has more simple interface, like just put(K,V) . Now
{code}HadoopMultimap{code} has complex interface part related to serialized
data reading (inner interfaces Adder, Key, Value).
was (Author: iveselovskiy):
To use RawComparator to compare objects we need two byte[] arrays , but it is
expensive to fetch them from offheap memory. So, the question raises, if we
should re-implement the sorting collection to an on-heap solution to use
RawComparator effectively.
> Spilled map-reduce: map side.
> -----------------------------
>
> Key: IGNITE-4097
> URL: https://issues.apache.org/jira/browse/IGNITE-4097
> Project: Ignite
> Issue Type: Sub-task
> Components: hadoop
> Affects Versions: 1.6
> Reporter: Ivan Veselovsky
> Assignee: Ivan Veselovsky
> Fix For: 1.9
>
>
> Implement spilled output on Map side of map-reduce.
> In general, algorithm should follow the one used in Hadoop. The difference on
> the Map side is that
> 1) we use sorting collection (Hadoop sorts a range of map outputs explicitly);
> 2) we store the map output in files not using FileSystem , but rather local
> files.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)