Re: A Proposal Apache Incubator Mnemonic as an alternative infra. for Apache Arrow

Gary Wed, 30 Mar 2016 18:21:12 -0700

Thanks both of you for the advice and guiding, We certainly follow up
and conform to those rules you mentioned.


On 3/30/2016 5:31 PM, Patrick Hunt wrote:
> Remember that no decisions should be made at the meeting. It's fine to
> have discussions, but those need to be brought back to the community
> before decisions are made. Summarizing for the dev@ mailing list, also
> jiras, etc... are good ways to socialize the issues.
>
> Patrick
>
> On Wed, Mar 30, 2016 at 5:17 PM, Henry Saputra <[email protected]> 
> wrote:
>> The community for both podlings are bigger than the ones show up at Strata
>> =)
>>
>> Would love to have the summary of the discussions in the dev@ list if
>> indeed some discussions happening at Strata.
>>
>> - Henry
>>
>> On Wed, Mar 30, 2016 at 5:03 PM, Wang, Yanping <[email protected]>
>> wrote:
>>
>>> Hi, All
>>>
>>> I met with Jacques today at Strata, we think it would be great that Arrow
>>> and Mnemonic communities can have a F2F meeting together to talk about our
>>> integration.
>>> I have following two days, 4/11 Monday afternoon, or 4/15 Friday.
>>> We can meet at  intel SC campus.
>>>
>>> Would you let me know if you are able to join us and which day you'd
>>> prefer?
>>>
>>> Thanks
>>> Yanping
>>>
>>>
>>> On Mar 29, 2016, at 4:38 PM, Gary <[email protected]<mailto:
>>> [email protected]>> wrote:
>>>
>>> Yes, I agree with you and that's great if we could brainstorm here to
>>> collect more ideas about enabling non-volatile memory usage for Apache
>>> Arrow through Mnemonic.
>>>
>>> for the questions, my ideas are:
>>>
>>>
>>> - Right now you are using unpooled persistent memory. Does that make sense
>>> or does chunking make more sense?
>>>
>>> Gary: I think it could make some sense if developer knows that their
>>> datasets are very big and they want Apache Arrow to keep most of them in
>>> memory for intensive computing e.g. sort.
>>>           the developer certainly can spill their Mnemonic managed
>>> datasets into disk but this way seems a bit inefficient in some scenarios
>>> that might depend on concrete application logic .
>>>
>>>
>>> - What do you think is the right way to transition back and forth between
>>> persistent and ephemeral memory? What do you think will be the first
>>> pattern to be adopted. For example, do you think we should try to use it as
>>> a tiered storage for sort spilling (before hitting the disk), or should we
>>> use it for caching?
>>> Gary: my 2 cents, the netty library looks not yet provide a elegant switch
>>> mechanism for Arrow to use, probably we can change the logic around
>>> "initialCapacity > directArena.chunkSize" to control which buffer put on
>>> off-heap or managed by Mnemonic, another approach is to let memory
>>> clustering mechanism of Mnemonic managing hybrid memory-like spaces instead
>>> of part logics of class PooledByteBufAllocatorL.
>>> Regarding the sorting, I think it is a typical case of random access to
>>> the data, we should avoid spilling as much as possible.
>>> my 2 cents, the performance could be
>>> all in off-heap if possible > mnemonic used as cache > all in mnemonic
>>> using NVMe/disk >  off-heap + spilling
>>> the code simplicity would be
>>> all in off-heap if possible >  all in mnemonic using NVMe/disk > mnemonic
>>> used as cache >  off-heap + spilling
>>>
>>> the reason why the mode "mnemonic used as cache + spilling" probably
>>> unnecessary is mnemonic could provide nearly equivalent capacity of disk.
>>>
>>> Thanks.
>>> Gary.
>>>
>>>
>>> -----Original Message-----
>>>
>>> From: Jacques Nadeau [mailto:[email protected]]
>>>
>>> Sent: Tuesday, March 29, 2016 8:05 AM
>>>
>>> To: <mailto:[email protected]> [email protected]<mailto:
>>> [email protected]>
>>>
>>> Subject: Re: A Proposal Apache Incubator Mnemonic as an alternative infra.
>>> for Apache Arrow
>>>
>>>
>>>
>>> This is super cool. A couple of questions:
>>>
>>>
>>>
>>> - Right now you are using unpooled persistent memory. Does that make sense
>>> or does chunking make more sense?
>>>
>>> - What do you think is the right way to transition back and forth between
>>> persistent and ephemeral memory? What do you think will be the first
>>> pattern to be adopted. For example, do you think we should try to use it as
>>> a tiered storage for sort spilling (before hitting the disk), or should we
>>> use it for caching?
>>>
>>>
>>>
>>> I think it will be much easier to think about this in the context of a
>>> primary or first use case. Do you have something in mind or should we
>>> brainstorm here?
>>>
>>>
>>>
>>> On Wed, Mar 23, 2016 at 7:16 PM, Gary <[email protected]<mailto:
>>> [email protected]>> wrote:
>>>
>>>
>>>
>>>> Hello,
>>>>    We have created a patch for Apache Arrow to leverage Apache
>>>> incubator Mnemonic as an alternative infra. for underlying memory
>>>> resources allocation, you can find it as below forked repo.
>>>> <https://github.com/NonVolatileComputing/arrow>
>>> https://github.com/NonVolatileComputing/arrow
>>>
>>>>     By this way, Apache Arrow could take some structural benefits from
>>>> Mnemonic project they are
>>>>     - Arrow is able to leverage larger capacity of high performance
>>>> hybrid storage devices. e.g. high-end SSD, NVMe
>>>>     - Mnemonic provide a potential opportunity for Arrow to
>>>> optimize/tuning its allocation algorithms as a native Arrow-oriented
>>>> allocation services
>>>>     - The non-volatile features of  Mnemonic make it possible that
>>>> Arrow could make its columnar in-memory data shared between different
>>>> applications or across life-cycle of single application
>>>>     - Arrow could take advantages of coming Mnemonic features of
>>>> memory clustering/DOG (distributed object graph) and massive native
>>>> computing
>>>>     - Mnemonic helps to reduce the pressure of main memory utilization
>>>> and its related system wide overheads.
>>>>    Our this patch is designed to minimize the changes for user to use
>>>> Arrow, please check out the test cases provided by this patch for your
>>>> reference.
>>>>    Note that, we need to put allocator services to a specified
>>>> position (indicated by pom.xml) for Mnemonic backed Arrow related test
>>>> cases to run because those services are required for external
>>>> memory-like device management.
>>>>    Please give your comments and review feedback for better
>>>> collaboration of Apache Arrow and Mnemonic, Thanks.
>>>> Best Regards.
>>>> Gary.
>>> <smime.p7m>
>>> <gpgol000.txt>
>>>

signature.asc
Description: OpenPGP digital signature

Re: A Proposal Apache Incubator Mnemonic as an alternative infra. for Apache Arrow

Reply via email to