Re: Implement an SerializableVector in Hyracks

Chen Li Mon, 18 Jan 2016 09:53:12 -0800

@Xi and Jianfeng: after we come up with the design, let's share it with the
group for an approval before the implementation.


Chen

On Fri, Jan 15, 2016 at 11:48 AM, Mike Carey <[email protected]> wrote:

> The accounting is just as critical as the chunking - we should do both
> (together).
>
>
> On 1/15/16 9:00 AM, Till Westmann wrote:
>
>> I don’t have relevant experience on the subject. But I think that it
>> sounds good to avoid arbitrarily long chunks of memory. Especially - as
>> Jianfeng wrote - it would be good to be able to a) account for this memory
>> and b) to manage it.
>> An interesting question for me would be what the overhead of such a
>> Vector is compared to a simple Java array and as a result where it should
>> be used to replace arrays. (The comparison in [3] only compares different
>> Scala collections, but doesn’t look at plain arrays.)
>>
>> Cheers,
>> Till
>>
>> On 14 Jan 2016, at 22:05, Chen Li wrote:
>>
>> Before we ask Xi to work on this project, it will be good to know if
>>> other people have seen similar problems and agree with this plan.
>>> @Till: can you share some tips?
>>>
>>> Chen
>>>
>>> On Wed, Jan 13, 2016 at 4:27 PM, Jianfeng Jia <[email protected]>
>>> wrote:
>>>
>>>> Hi Devs,
>>>>
>>>> First of all, Xi Zhang is a Master student at UCI wants to work with us
>>>> for a while. Welcome Xi!
>>>>
>>>> We are thinking of making a Frame-based, memory-bound
>>>> SerializableVector at first. We expect this vector can solve some
>>>> occasionally Java.Heap.OutOfMemory exceptions in Hyracks.
>>>> Though we did a good job on organizing the record-located memory, the
>>>> OOM exception can still happen while operating the auxiliary data
>>>> structure. For example in the sort run generator, instead of moving record
>>>> around we are creating an reference “pointer" array to the original record.
>>>> However, if the record is small and the size of that int array will be
>>>> large, then the OOM exception will occur, which is the case of issue [1].
>>>>
>>>> One way to solve this problem is to put auxiliary data structures into
>>>> the memory-bounded frame as well. In general, it will be much easier to ask
>>>> for multiple small memory blocks than one big chunk of memory. I guess that
>>>> was the same reason why we have “SerializableHashTable” for HashJoin. It
>>>> will be nice to have a more general structure that can be used by all the
>>>> operators.
>>>>
>>>> The Frame based Vector idea is inspired by the Scala Vector[2] which
>>>> looks like a List, but internally it is implemented as a 32-ary tree. The
>>>> performance of it is very stable for variety size of object[3]. It will
>>>> have all the benefits of ArrayList and the LinkedList. In addition, we can
>>>> take the memory usage of the auxiliary structure into the calculation. We
>>>> will work on the detailed design doc later if we are agree on this
>>>> direction.
>>>>
>>>> Any thoughts or suggestions? Thank you!
>>>>
>>>>
>>>> [1]
>>>> https://code.google.com/p/asterixdb/issues/detail?id=934&can=1&q=last%20straw&colspec=ID%20Type%20Status%20Priority%20Milestone%20Owner%20Summary%20ETA%20Severity
>>>> <
>>>> https://code.google.com/p/asterixdb/issues/detail?id=934&can=1&q=last%20straw&colspec=ID%20Type%20Status%20Priority%20Milestone%20Owner%20Summary%20ETA%20Severity>
>>>>
>>>> [2] https://bitbucket.org/astrieanna/bitmapped-vector-trie <
>>>> https://bitbucket.org/astrieanna/bitmapped-vector-trie>
>>>> [3]
>>>> http://danielasfregola.com/2015/06/15/which-immutable-scala-collection/
>>>> <
>>>> http://danielasfregola.com/2015/06/15/which-immutable-scala-collection/>
>>>>
>>>>
>>>> Best,
>>>>
>>>> Jianfeng Jia
>>>> PhD Candidate of Computer Science
>>>> University of California, Irvine
>>>>
>>>>
>

Re: Implement an SerializableVector in Hyracks

Reply via email to