[ 
https://issues.apache.org/jira/browse/ARROW-1795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16250908#comment-16250908
 ] 

ASF GitHub Bot commented on ARROW-1795:
---------------------------------------

robertnishihara opened a new pull request #1317: ARROW-1795: [Plasma] Fixes to 
eviction policy.
URL: https://github.com/apache/arrow/pull/1317
 
 
   I need to double check this code, so don't merge anything yet.
   
   @luchy0120 pointed out some issues with the eviction policy in 
https://issues.apache.org/jira/browse/ARROW-1795.
   
   You can see this for example by starting a plasma store with
   
   ```
   plasma_store -m 800000000 -s /tmp/store1
   ```
   
   and the running
   
   ```python
   import pyarrow.plasma as plasma
   import numpy as np
   
   c = plasma.connect('/tmp/store1', '', 64)
   
   def create_and_release(size):
       obj_id = plasma.ObjectID(np.random.bytes(20))
       c.create(obj_id, size)
       c.seal(obj_id)
   
   create_and_release(5 * 10 ** 8)
   create_and_release(6 * 10 ** 8)
   ```
   
   The last line will fail with 
   
   ```
   PlasmaStoreFull: object does not fit in the plasma store
   ```
   
   It turns out this is not really a bug. It's counterintuitive and 
undesirable, but it is sort of an artifact of starting the object store with 
very little memory. In the above script, plasma first creates a memory mapped 
file that fits 5MB but not 6MB. Then we try to allocate 6MB but due to 
fragmentation, we can neither fit it in the old memory mapped file nor create a 
sufficiently large new memory-mapped file (even after evicting the 5MB object) 
because we never `munmap` anything. Even though we could `munmap` the 5MB file 
in this case, we might not be able to in general if some objects are still 
alive in all of the files.
   
   The above script will continue to throw the "object does not fit" error even 
after this PR.
   
   Anyway, looking into this I think I uncovered some other issues with our 
eviction policy which I'm trying to fix here.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


> [Plasma C++] change evict policy
> --------------------------------
>
>                 Key: ARROW-1795
>                 URL: https://issues.apache.org/jira/browse/ARROW-1795
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: Plasma (C++)
>            Reporter: Lu Qi 
>            Assignee: Lu Qi 
>            Priority: Minor
>              Labels: pull-request-available
>
> case 1.say, we have total free memory 8 G , we have input 5G data, then comes 
> another 6G data, 
> if we choose to evict space 6G , it will throw exception saying that
> no object can be free. This is because we didn't count the 3G remaining free
> space .If we count this remaining 3G , we need to ask only 3G,thus
> we can evict the 5G data and we are still alive . 
> case 2. another situation is :  if we have free memory 10G , we input 1.5G 
> data ,then comes another
> 9G data , if we use  10*20% = 2G data to evict ,then we will crash . In this 
> situation we need to 
> use 9+1.5-10 = 0.5G data to evict  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to