Agreed!
On 15 Jan 2016, at 10:30, Jianfeng Jia wrote:
Sounds a positive support :-).
One thing this Vector will sure better than plain array is to support
editing. (Update, Insert, Delete, even Append is better)
It will have some overhead than the plain array as for the random
access. How much overhead it introduced will be an interesting
experiment to do after we have some initial implementation.
On Jan 15, 2016, at 9:00 AM, Till Westmann <[email protected]> wrote:
I don’t have relevant experience on the subject. But I think that
it sounds good to avoid arbitrarily long chunks of memory. Especially
- as Jianfeng wrote - it would be good to be able to a) account for
this memory and b) to manage it.
An interesting question for me would be what the overhead of such a
Vector is compared to a simple Java array and as a result where it
should be used to replace arrays. (The comparison in [3] only
compares different Scala collections, but doesn’t look at plain
arrays.)
Cheers,
Till
On 14 Jan 2016, at 22:05, Chen Li wrote:
Before we ask Xi to work on this project, it will be good to know if
other people have seen similar problems and agree with this plan.
@Till: can you share some tips?
Chen
On Wed, Jan 13, 2016 at 4:27 PM, Jianfeng Jia
<[email protected]> wrote:
Hi Devs,
First of all, Xi Zhang is a Master student at UCI wants to work
with us for a while. Welcome Xi!
We are thinking of making a Frame-based, memory-bound
SerializableVector at first. We expect this vector can solve some
occasionally Java.Heap.OutOfMemory exceptions in Hyracks.
Though we did a good job on organizing the record-located memory,
the OOM exception can still happen while operating the auxiliary
data structure. For example in the sort run generator, instead of
moving record around we are creating an reference “pointer" array
to the original record. However, if the record is small and the
size of that int array will be large, then the OOM exception will
occur, which is the case of issue [1].
One way to solve this problem is to put auxiliary data structures
into the memory-bounded frame as well. In general, it will be much
easier to ask for multiple small memory blocks than one big chunk
of memory. I guess that was the same reason why we have
“SerializableHashTable” for HashJoin. It will be nice to have a
more general structure that can be used by all the operators.
The Frame based Vector idea is inspired by the Scala Vector[2]
which looks like a List, but internally it is implemented as a
32-ary tree. The performance of it is very stable for variety size
of object[3]. It will have all the benefits of ArrayList and the
LinkedList. In addition, we can take the memory usage of the
auxiliary structure into the calculation. We will work on the
detailed design doc later if we are agree on this direction.
Any thoughts or suggestions? Thank you!
[1]
https://code.google.com/p/asterixdb/issues/detail?id=934&can=1&q=last%20straw&colspec=ID%20Type%20Status%20Priority%20Milestone%20Owner%20Summary%20ETA%20Severity
<https://code.google.com/p/asterixdb/issues/detail?id=934&can=1&q=last%20straw&colspec=ID%20Type%20Status%20Priority%20Milestone%20Owner%20Summary%20ETA%20Severity>
[2] https://bitbucket.org/astrieanna/bitmapped-vector-trie
<https://bitbucket.org/astrieanna/bitmapped-vector-trie>
[3]
http://danielasfregola.com/2015/06/15/which-immutable-scala-collection/
<http://danielasfregola.com/2015/06/15/which-immutable-scala-collection/>
Best,
Jianfeng Jia
PhD Candidate of Computer Science
University of California, Irvine
Best,
Jianfeng Jia
PhD Candidate of Computer Science
University of California, Irvine