Before we ask Xi to work on this project, it will be good to know if other people have seen similar problems and agree with this plan. @Till: can you share some tips?
Chen On Wed, Jan 13, 2016 at 4:27 PM, Jianfeng Jia <[email protected]> wrote: > Hi Devs, > > First of all, Xi Zhang is a Master student at UCI wants to work with us for a > while. Welcome Xi! > > We are thinking of making a Frame-based, memory-bound SerializableVector at > first. We expect this vector can solve some occasionally > Java.Heap.OutOfMemory exceptions in Hyracks. > Though we did a good job on organizing the record-located memory, the OOM > exception can still happen while operating the auxiliary data structure. For > example in the sort run generator, instead of moving record around we are > creating an reference “pointer" array to the original record. However, if the > record is small and the size of that int array will be large, then the OOM > exception will occur, which is the case of issue [1]. > > One way to solve this problem is to put auxiliary data structures into the > memory-bounded frame as well. In general, it will be much easier to ask for > multiple small memory blocks than one big chunk of memory. I guess that was > the same reason why we have “SerializableHashTable” for HashJoin. It will be > nice to have a more general structure that can be used by all the operators. > > The Frame based Vector idea is inspired by the Scala Vector[2] which looks > like a List, but internally it is implemented as a 32-ary tree. The > performance of it is very stable for variety size of object[3]. It will have > all the benefits of ArrayList and the LinkedList. In addition, we can take > the memory usage of the auxiliary structure into the calculation. We will > work on the detailed design doc later if we are agree on this direction. > > Any thoughts or suggestions? Thank you! > > > [1] > https://code.google.com/p/asterixdb/issues/detail?id=934&can=1&q=last%20straw&colspec=ID%20Type%20Status%20Priority%20Milestone%20Owner%20Summary%20ETA%20Severity > > <https://code.google.com/p/asterixdb/issues/detail?id=934&can=1&q=last%20straw&colspec=ID%20Type%20Status%20Priority%20Milestone%20Owner%20Summary%20ETA%20Severity> > [2] https://bitbucket.org/astrieanna/bitmapped-vector-trie > <https://bitbucket.org/astrieanna/bitmapped-vector-trie> > [3] http://danielasfregola.com/2015/06/15/which-immutable-scala-collection/ > <http://danielasfregola.com/2015/06/15/which-immutable-scala-collection/> > > Best, > > Jianfeng Jia > PhD Candidate of Computer Science > University of California, Irvine >
