Have you guys looked at the Drill ValueVectors? This structure is being spun out as Apache Arrow with multiple interfaces and language bindings.
On Mon, Feb 1, 2016 at 9:56 AM, Jianfeng Jia <[email protected]> wrote: > Hi, > > We plan to implement an append-only array at first. The main reason is > this is how the auxiliary data structure be used so far. Then the > implementation is straightforward. > > The tree-structured Vector in Scala can save a lot in updating case mainly > because of their immutable requirement. It can saving unnecessary data > copies comparing to other immutable list when updating. We only allow > in-place update. The tree design may be overkill for us. > > Xi made on detailed design doc is here: > https://docs.google.com/document/d/1bs3JBCxmvuJZmBq_gKzUDt0FZdM9j3d3MPAmC29ZoSA/edit?usp=sharing > Any thoughts or comments? > > > > On Jan 18, 2016, at 9:52 AM, Chen Li <[email protected]> wrote: > > > > @Xi and Jianfeng: after we come up with the design, let's share it with > the > > group for an approval before the implementation. > > > > Chen > > > > On Fri, Jan 15, 2016 at 11:48 AM, Mike Carey <[email protected]> wrote: > > > >> The accounting is just as critical as the chunking - we should do both > >> (together). > >> > >> > >> On 1/15/16 9:00 AM, Till Westmann wrote: > >> > >>> I don’t have relevant experience on the subject. But I think that it > >>> sounds good to avoid arbitrarily long chunks of memory. Especially - as > >>> Jianfeng wrote - it would be good to be able to a) account for this > memory > >>> and b) to manage it. > >>> An interesting question for me would be what the overhead of such a > >>> Vector is compared to a simple Java array and as a result where it > should > >>> be used to replace arrays. (The comparison in [3] only compares > different > >>> Scala collections, but doesn’t look at plain arrays.) > >>> > >>> Cheers, > >>> Till > >>> > >>> On 14 Jan 2016, at 22:05, Chen Li wrote: > >>> > >>> Before we ask Xi to work on this project, it will be good to know if > >>>> other people have seen similar problems and agree with this plan. > >>>> @Till: can you share some tips? > >>>> > >>>> Chen > >>>> > >>>> On Wed, Jan 13, 2016 at 4:27 PM, Jianfeng Jia <[email protected] > > > >>>> wrote: > >>>> > >>>>> Hi Devs, > >>>>> > >>>>> First of all, Xi Zhang is a Master student at UCI wants to work with > us > >>>>> for a while. Welcome Xi! > >>>>> > >>>>> We are thinking of making a Frame-based, memory-bound > >>>>> SerializableVector at first. We expect this vector can solve some > >>>>> occasionally Java.Heap.OutOfMemory exceptions in Hyracks. > >>>>> Though we did a good job on organizing the record-located memory, the > >>>>> OOM exception can still happen while operating the auxiliary data > >>>>> structure. For example in the sort run generator, instead of moving > record > >>>>> around we are creating an reference “pointer" array to the original > record. > >>>>> However, if the record is small and the size of that int array will > be > >>>>> large, then the OOM exception will occur, which is the case of issue > [1]. > >>>>> > >>>>> One way to solve this problem is to put auxiliary data structures > into > >>>>> the memory-bounded frame as well. In general, it will be much easier > to ask > >>>>> for multiple small memory blocks than one big chunk of memory. I > guess that > >>>>> was the same reason why we have “SerializableHashTable” for > HashJoin. It > >>>>> will be nice to have a more general structure that can be used by > all the > >>>>> operators. > >>>>> > >>>>> The Frame based Vector idea is inspired by the Scala Vector[2] which > >>>>> looks like a List, but internally it is implemented as a 32-ary > tree. The > >>>>> performance of it is very stable for variety size of object[3]. It > will > >>>>> have all the benefits of ArrayList and the LinkedList. In addition, > we can > >>>>> take the memory usage of the auxiliary structure into the > calculation. We > >>>>> will work on the detailed design doc later if we are agree on this > >>>>> direction. > >>>>> > >>>>> Any thoughts or suggestions? Thank you! > >>>>> > >>>>> > >>>>> [1] > >>>>> > https://code.google.com/p/asterixdb/issues/detail?id=934&can=1&q=last%20straw&colspec=ID%20Type%20Status%20Priority%20Milestone%20Owner%20Summary%20ETA%20Severity > >>>>> < > >>>>> > https://code.google.com/p/asterixdb/issues/detail?id=934&can=1&q=last%20straw&colspec=ID%20Type%20Status%20Priority%20Milestone%20Owner%20Summary%20ETA%20Severity > > > >>>>> > >>>>> [2] https://bitbucket.org/astrieanna/bitmapped-vector-trie < > >>>>> https://bitbucket.org/astrieanna/bitmapped-vector-trie> > >>>>> [3] > >>>>> > http://danielasfregola.com/2015/06/15/which-immutable-scala-collection/ > >>>>> < > >>>>> > http://danielasfregola.com/2015/06/15/which-immutable-scala-collection/> > >>>>> > >>>>> > >>>>> Best, > >>>>> > >>>>> Jianfeng Jia > >>>>> PhD Candidate of Computer Science > >>>>> University of California, Irvine > >>>>> > >>>>> > >> > > > > Best, > > Jianfeng Jia > PhD Candidate of Computer Science > University of California, Irvine > >
