Hi, 

We plan to implement an append-only array at first. The main reason is this is 
how the auxiliary data structure be used so far. Then the implementation is 
straightforward. 

The tree-structured Vector in Scala can save a lot in updating case mainly 
because of their immutable requirement. It can saving unnecessary data copies 
comparing to other immutable list when updating. We only allow in-place update. 
The tree design may be overkill for us.

Xi made on detailed design doc is here: 
https://docs.google.com/document/d/1bs3JBCxmvuJZmBq_gKzUDt0FZdM9j3d3MPAmC29ZoSA/edit?usp=sharing
Any thoughts or comments? 


> On Jan 18, 2016, at 9:52 AM, Chen Li <[email protected]> wrote:
> 
> @Xi and Jianfeng: after we come up with the design, let's share it with the
> group for an approval before the implementation.
> 
> Chen
> 
> On Fri, Jan 15, 2016 at 11:48 AM, Mike Carey <[email protected]> wrote:
> 
>> The accounting is just as critical as the chunking - we should do both
>> (together).
>> 
>> 
>> On 1/15/16 9:00 AM, Till Westmann wrote:
>> 
>>> I don’t have relevant experience on the subject. But I think that it
>>> sounds good to avoid arbitrarily long chunks of memory. Especially - as
>>> Jianfeng wrote - it would be good to be able to a) account for this memory
>>> and b) to manage it.
>>> An interesting question for me would be what the overhead of such a
>>> Vector is compared to a simple Java array and as a result where it should
>>> be used to replace arrays. (The comparison in [3] only compares different
>>> Scala collections, but doesn’t look at plain arrays.)
>>> 
>>> Cheers,
>>> Till
>>> 
>>> On 14 Jan 2016, at 22:05, Chen Li wrote:
>>> 
>>> Before we ask Xi to work on this project, it will be good to know if
>>>> other people have seen similar problems and agree with this plan.
>>>> @Till: can you share some tips?
>>>> 
>>>> Chen
>>>> 
>>>> On Wed, Jan 13, 2016 at 4:27 PM, Jianfeng Jia <[email protected]>
>>>> wrote:
>>>> 
>>>>> Hi Devs,
>>>>> 
>>>>> First of all, Xi Zhang is a Master student at UCI wants to work with us
>>>>> for a while. Welcome Xi!
>>>>> 
>>>>> We are thinking of making a Frame-based, memory-bound
>>>>> SerializableVector at first. We expect this vector can solve some
>>>>> occasionally Java.Heap.OutOfMemory exceptions in Hyracks.
>>>>> Though we did a good job on organizing the record-located memory, the
>>>>> OOM exception can still happen while operating the auxiliary data
>>>>> structure. For example in the sort run generator, instead of moving record
>>>>> around we are creating an reference “pointer" array to the original 
>>>>> record.
>>>>> However, if the record is small and the size of that int array will be
>>>>> large, then the OOM exception will occur, which is the case of issue [1].
>>>>> 
>>>>> One way to solve this problem is to put auxiliary data structures into
>>>>> the memory-bounded frame as well. In general, it will be much easier to 
>>>>> ask
>>>>> for multiple small memory blocks than one big chunk of memory. I guess 
>>>>> that
>>>>> was the same reason why we have “SerializableHashTable” for HashJoin. It
>>>>> will be nice to have a more general structure that can be used by all the
>>>>> operators.
>>>>> 
>>>>> The Frame based Vector idea is inspired by the Scala Vector[2] which
>>>>> looks like a List, but internally it is implemented as a 32-ary tree. The
>>>>> performance of it is very stable for variety size of object[3]. It will
>>>>> have all the benefits of ArrayList and the LinkedList. In addition, we can
>>>>> take the memory usage of the auxiliary structure into the calculation. We
>>>>> will work on the detailed design doc later if we are agree on this
>>>>> direction.
>>>>> 
>>>>> Any thoughts or suggestions? Thank you!
>>>>> 
>>>>> 
>>>>> [1]
>>>>> https://code.google.com/p/asterixdb/issues/detail?id=934&can=1&q=last%20straw&colspec=ID%20Type%20Status%20Priority%20Milestone%20Owner%20Summary%20ETA%20Severity
>>>>> <
>>>>> https://code.google.com/p/asterixdb/issues/detail?id=934&can=1&q=last%20straw&colspec=ID%20Type%20Status%20Priority%20Milestone%20Owner%20Summary%20ETA%20Severity>
>>>>> 
>>>>> [2] https://bitbucket.org/astrieanna/bitmapped-vector-trie <
>>>>> https://bitbucket.org/astrieanna/bitmapped-vector-trie>
>>>>> [3]
>>>>> http://danielasfregola.com/2015/06/15/which-immutable-scala-collection/
>>>>> <
>>>>> http://danielasfregola.com/2015/06/15/which-immutable-scala-collection/>
>>>>> 
>>>>> 
>>>>> Best,
>>>>> 
>>>>> Jianfeng Jia
>>>>> PhD Candidate of Computer Science
>>>>> University of California, Irvine
>>>>> 
>>>>> 
>> 



Best,

Jianfeng Jia
PhD Candidate of Computer Science
University of California, Irvine

Reply via email to