Hi, We plan to implement an append-only array at first. The main reason is this is how the auxiliary data structure be used so far. Then the implementation is straightforward.
The tree-structured Vector in Scala can save a lot in updating case mainly because of their immutable requirement. It can saving unnecessary data copies comparing to other immutable list when updating. We only allow in-place update. The tree design may be overkill for us. Xi made on detailed design doc is here: https://docs.google.com/document/d/1bs3JBCxmvuJZmBq_gKzUDt0FZdM9j3d3MPAmC29ZoSA/edit?usp=sharing Any thoughts or comments? > On Jan 18, 2016, at 9:52 AM, Chen Li <[email protected]> wrote: > > @Xi and Jianfeng: after we come up with the design, let's share it with the > group for an approval before the implementation. > > Chen > > On Fri, Jan 15, 2016 at 11:48 AM, Mike Carey <[email protected]> wrote: > >> The accounting is just as critical as the chunking - we should do both >> (together). >> >> >> On 1/15/16 9:00 AM, Till Westmann wrote: >> >>> I don’t have relevant experience on the subject. But I think that it >>> sounds good to avoid arbitrarily long chunks of memory. Especially - as >>> Jianfeng wrote - it would be good to be able to a) account for this memory >>> and b) to manage it. >>> An interesting question for me would be what the overhead of such a >>> Vector is compared to a simple Java array and as a result where it should >>> be used to replace arrays. (The comparison in [3] only compares different >>> Scala collections, but doesn’t look at plain arrays.) >>> >>> Cheers, >>> Till >>> >>> On 14 Jan 2016, at 22:05, Chen Li wrote: >>> >>> Before we ask Xi to work on this project, it will be good to know if >>>> other people have seen similar problems and agree with this plan. >>>> @Till: can you share some tips? >>>> >>>> Chen >>>> >>>> On Wed, Jan 13, 2016 at 4:27 PM, Jianfeng Jia <[email protected]> >>>> wrote: >>>> >>>>> Hi Devs, >>>>> >>>>> First of all, Xi Zhang is a Master student at UCI wants to work with us >>>>> for a while. Welcome Xi! >>>>> >>>>> We are thinking of making a Frame-based, memory-bound >>>>> SerializableVector at first. We expect this vector can solve some >>>>> occasionally Java.Heap.OutOfMemory exceptions in Hyracks. >>>>> Though we did a good job on organizing the record-located memory, the >>>>> OOM exception can still happen while operating the auxiliary data >>>>> structure. For example in the sort run generator, instead of moving record >>>>> around we are creating an reference “pointer" array to the original >>>>> record. >>>>> However, if the record is small and the size of that int array will be >>>>> large, then the OOM exception will occur, which is the case of issue [1]. >>>>> >>>>> One way to solve this problem is to put auxiliary data structures into >>>>> the memory-bounded frame as well. In general, it will be much easier to >>>>> ask >>>>> for multiple small memory blocks than one big chunk of memory. I guess >>>>> that >>>>> was the same reason why we have “SerializableHashTable” for HashJoin. It >>>>> will be nice to have a more general structure that can be used by all the >>>>> operators. >>>>> >>>>> The Frame based Vector idea is inspired by the Scala Vector[2] which >>>>> looks like a List, but internally it is implemented as a 32-ary tree. The >>>>> performance of it is very stable for variety size of object[3]. It will >>>>> have all the benefits of ArrayList and the LinkedList. In addition, we can >>>>> take the memory usage of the auxiliary structure into the calculation. We >>>>> will work on the detailed design doc later if we are agree on this >>>>> direction. >>>>> >>>>> Any thoughts or suggestions? Thank you! >>>>> >>>>> >>>>> [1] >>>>> https://code.google.com/p/asterixdb/issues/detail?id=934&can=1&q=last%20straw&colspec=ID%20Type%20Status%20Priority%20Milestone%20Owner%20Summary%20ETA%20Severity >>>>> < >>>>> https://code.google.com/p/asterixdb/issues/detail?id=934&can=1&q=last%20straw&colspec=ID%20Type%20Status%20Priority%20Milestone%20Owner%20Summary%20ETA%20Severity> >>>>> >>>>> [2] https://bitbucket.org/astrieanna/bitmapped-vector-trie < >>>>> https://bitbucket.org/astrieanna/bitmapped-vector-trie> >>>>> [3] >>>>> http://danielasfregola.com/2015/06/15/which-immutable-scala-collection/ >>>>> < >>>>> http://danielasfregola.com/2015/06/15/which-immutable-scala-collection/> >>>>> >>>>> >>>>> Best, >>>>> >>>>> Jianfeng Jia >>>>> PhD Candidate of Computer Science >>>>> University of California, Irvine >>>>> >>>>> >> Best, Jianfeng Jia PhD Candidate of Computer Science University of California, Irvine
