[ 
https://issues.apache.org/jira/browse/ARROW-13644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17400247#comment-17400247
 ] 

Weston Pace commented on ARROW-13644:
-------------------------------------

I guess I'm not sure I follow.  Let's assume the user only wants two files open 
(obviously it should never be this low).  A batch comes in that is partitioned 
on files X and Y.  Then a batch comes in that is partitioned on files X and Z.  
I want to close file Y before I open file Z.  With a semaphore I have something 
like...


{code:python}
def queue_write(f, batch):
  # actual write
  release(1)

for f, batch in files:
  acquire(1)
  queue_write(f, batch)
{code}

It seems to me that the semaphore would just block indefinitely.  File Y will 
not close itself, even when it finishes writing (files are held open in case 
more data comes for that file).

> [C++] Create LruCache that works with futures
> ---------------------------------------------
>
>                 Key: ARROW-13644
>                 URL: https://issues.apache.org/jira/browse/ARROW-13644
>             Project: Apache Arrow
>          Issue Type: Sub-task
>          Components: C++
>            Reporter: Weston Pace
>            Assignee: Weston Pace
>            Priority: Major
>
> The dataset writer needs an LRU cache to keep track of open files so that it 
> can respect a "max open files" property (see ARROW-12321).  A synchronous 
> LruCache implementation already exists but on eviction from the cache we need 
> to wait until all pending writes have completed before we evict the item and 
> open a new file.  This ticket is to create an AsyncLruCache which will allow 
> the creation of items and the eviction of items to be asynchronous.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to