Jianfeng and Xi: eventually please make sure to put the design doc to the wiki site.
Chen On Mon, Feb 1, 2016 at 9:56 AM, Jianfeng Jia <[email protected]> wrote: > Hi, > > We plan to implement an append-only array at first. The main reason is this > is how the auxiliary data structure be used so far. Then the implementation > is straightforward. > > The tree-structured Vector in Scala can save a lot in updating case mainly > because of their immutable requirement. It can saving unnecessary data copies > comparing to other immutable list when updating. We only allow in-place > update. The tree design may be overkill for us. > > Xi made on detailed design doc is here: > https://docs.google.com/document/d/1bs3JBCxmvuJZmBq_gKzUDt0FZdM9j3d3MPAmC29ZoSA/edit?usp=sharing > Any thoughts or comments? > > >> On Jan 18, 2016, at 9:52 AM, Chen Li <[email protected]> wrote: >> >> @Xi and Jianfeng: after we come up with the design, let's share it with the >> group for an approval before the implementation. >> >> Chen >> >> On Fri, Jan 15, 2016 at 11:48 AM, Mike Carey <[email protected]> wrote: >> >>> The accounting is just as critical as the chunking - we should do both >>> (together). >>> >>> >>> On 1/15/16 9:00 AM, Till Westmann wrote: >>> >>>> I don’t have relevant experience on the subject. But I think that it >>>> sounds good to avoid arbitrarily long chunks of memory. Especially - as >>>> Jianfeng wrote - it would be good to be able to a) account for this memory >>>> and b) to manage it. >>>> An interesting question for me would be what the overhead of such a >>>> Vector is compared to a simple Java array and as a result where it should >>>> be used to replace arrays. (The comparison in [3] only compares different >>>> Scala collections, but doesn’t look at plain arrays.) >>>> >>>> Cheers, >>>> Till >>>> >>>> On 14 Jan 2016, at 22:05, Chen Li wrote: >>>> >>>> Before we ask Xi to work on this project, it will be good to know if >>>>> other people have seen similar problems and agree with this plan. >>>>> @Till: can you share some tips? >>>>> >>>>> Chen >>>>> >>>>> On Wed, Jan 13, 2016 at 4:27 PM, Jianfeng Jia <[email protected]> >>>>> wrote: >>>>> >>>>>> Hi Devs, >>>>>> >>>>>> First of all, Xi Zhang is a Master student at UCI wants to work with us >>>>>> for a while. Welcome Xi! >>>>>> >>>>>> We are thinking of making a Frame-based, memory-bound >>>>>> SerializableVector at first. We expect this vector can solve some >>>>>> occasionally Java.Heap.OutOfMemory exceptions in Hyracks. >>>>>> Though we did a good job on organizing the record-located memory, the >>>>>> OOM exception can still happen while operating the auxiliary data >>>>>> structure. For example in the sort run generator, instead of moving >>>>>> record >>>>>> around we are creating an reference “pointer" array to the original >>>>>> record. >>>>>> However, if the record is small and the size of that int array will be >>>>>> large, then the OOM exception will occur, which is the case of issue [1]. >>>>>> >>>>>> One way to solve this problem is to put auxiliary data structures into >>>>>> the memory-bounded frame as well. In general, it will be much easier to >>>>>> ask >>>>>> for multiple small memory blocks than one big chunk of memory. I guess >>>>>> that >>>>>> was the same reason why we have “SerializableHashTable” for HashJoin. It >>>>>> will be nice to have a more general structure that can be used by all the >>>>>> operators. >>>>>> >>>>>> The Frame based Vector idea is inspired by the Scala Vector[2] which >>>>>> looks like a List, but internally it is implemented as a 32-ary tree. The >>>>>> performance of it is very stable for variety size of object[3]. It will >>>>>> have all the benefits of ArrayList and the LinkedList. In addition, we >>>>>> can >>>>>> take the memory usage of the auxiliary structure into the calculation. We >>>>>> will work on the detailed design doc later if we are agree on this >>>>>> direction. >>>>>> >>>>>> Any thoughts or suggestions? Thank you! >>>>>> >>>>>> >>>>>> [1] >>>>>> https://code.google.com/p/asterixdb/issues/detail?id=934&can=1&q=last%20straw&colspec=ID%20Type%20Status%20Priority%20Milestone%20Owner%20Summary%20ETA%20Severity >>>>>> < >>>>>> https://code.google.com/p/asterixdb/issues/detail?id=934&can=1&q=last%20straw&colspec=ID%20Type%20Status%20Priority%20Milestone%20Owner%20Summary%20ETA%20Severity> >>>>>> >>>>>> [2] https://bitbucket.org/astrieanna/bitmapped-vector-trie < >>>>>> https://bitbucket.org/astrieanna/bitmapped-vector-trie> >>>>>> [3] >>>>>> http://danielasfregola.com/2015/06/15/which-immutable-scala-collection/ >>>>>> < >>>>>> http://danielasfregola.com/2015/06/15/which-immutable-scala-collection/> >>>>>> >>>>>> >>>>>> Best, >>>>>> >>>>>> Jianfeng Jia >>>>>> PhD Candidate of Computer Science >>>>>> University of California, Irvine >>>>>> >>>>>> >>> > > > > Best, > > Jianfeng Jia > PhD Candidate of Computer Science > University of California, Irvine >
