@Xi and Jianfeng: after we come up with the design, let's share it with the group for an approval before the implementation.
Chen On Fri, Jan 15, 2016 at 11:48 AM, Mike Carey <[email protected]> wrote: > The accounting is just as critical as the chunking - we should do both > (together). > > > On 1/15/16 9:00 AM, Till Westmann wrote: > >> I don’t have relevant experience on the subject. But I think that it >> sounds good to avoid arbitrarily long chunks of memory. Especially - as >> Jianfeng wrote - it would be good to be able to a) account for this memory >> and b) to manage it. >> An interesting question for me would be what the overhead of such a >> Vector is compared to a simple Java array and as a result where it should >> be used to replace arrays. (The comparison in [3] only compares different >> Scala collections, but doesn’t look at plain arrays.) >> >> Cheers, >> Till >> >> On 14 Jan 2016, at 22:05, Chen Li wrote: >> >> Before we ask Xi to work on this project, it will be good to know if >>> other people have seen similar problems and agree with this plan. >>> @Till: can you share some tips? >>> >>> Chen >>> >>> On Wed, Jan 13, 2016 at 4:27 PM, Jianfeng Jia <[email protected]> >>> wrote: >>> >>>> Hi Devs, >>>> >>>> First of all, Xi Zhang is a Master student at UCI wants to work with us >>>> for a while. Welcome Xi! >>>> >>>> We are thinking of making a Frame-based, memory-bound >>>> SerializableVector at first. We expect this vector can solve some >>>> occasionally Java.Heap.OutOfMemory exceptions in Hyracks. >>>> Though we did a good job on organizing the record-located memory, the >>>> OOM exception can still happen while operating the auxiliary data >>>> structure. For example in the sort run generator, instead of moving record >>>> around we are creating an reference “pointer" array to the original record. >>>> However, if the record is small and the size of that int array will be >>>> large, then the OOM exception will occur, which is the case of issue [1]. >>>> >>>> One way to solve this problem is to put auxiliary data structures into >>>> the memory-bounded frame as well. In general, it will be much easier to ask >>>> for multiple small memory blocks than one big chunk of memory. I guess that >>>> was the same reason why we have “SerializableHashTable” for HashJoin. It >>>> will be nice to have a more general structure that can be used by all the >>>> operators. >>>> >>>> The Frame based Vector idea is inspired by the Scala Vector[2] which >>>> looks like a List, but internally it is implemented as a 32-ary tree. The >>>> performance of it is very stable for variety size of object[3]. It will >>>> have all the benefits of ArrayList and the LinkedList. In addition, we can >>>> take the memory usage of the auxiliary structure into the calculation. We >>>> will work on the detailed design doc later if we are agree on this >>>> direction. >>>> >>>> Any thoughts or suggestions? Thank you! >>>> >>>> >>>> [1] >>>> https://code.google.com/p/asterixdb/issues/detail?id=934&can=1&q=last%20straw&colspec=ID%20Type%20Status%20Priority%20Milestone%20Owner%20Summary%20ETA%20Severity >>>> < >>>> https://code.google.com/p/asterixdb/issues/detail?id=934&can=1&q=last%20straw&colspec=ID%20Type%20Status%20Priority%20Milestone%20Owner%20Summary%20ETA%20Severity> >>>> >>>> [2] https://bitbucket.org/astrieanna/bitmapped-vector-trie < >>>> https://bitbucket.org/astrieanna/bitmapped-vector-trie> >>>> [3] >>>> http://danielasfregola.com/2015/06/15/which-immutable-scala-collection/ >>>> < >>>> http://danielasfregola.com/2015/06/15/which-immutable-scala-collection/> >>>> >>>> >>>> Best, >>>> >>>> Jianfeng Jia >>>> PhD Candidate of Computer Science >>>> University of California, Irvine >>>> >>>> >
