[
https://issues.apache.org/jira/browse/ARROW-13644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17400247#comment-17400247
]
Weston Pace commented on ARROW-13644:
-------------------------------------
I guess I'm not sure I follow. Let's assume the user only wants two files open
(obviously it should never be this low). A batch comes in that is partitioned
on files X and Y. Then a batch comes in that is partitioned on files X and Z.
I want to close file Y before I open file Z. With a semaphore I have something
like...
{code:python}
def queue_write(f, batch):
# actual write
release(1)
for f, batch in files:
acquire(1)
queue_write(f, batch)
{code}
It seems to me that the semaphore would just block indefinitely. File Y will
not close itself, even when it finishes writing (files are held open in case
more data comes for that file).
> [C++] Create LruCache that works with futures
> ---------------------------------------------
>
> Key: ARROW-13644
> URL: https://issues.apache.org/jira/browse/ARROW-13644
> Project: Apache Arrow
> Issue Type: Sub-task
> Components: C++
> Reporter: Weston Pace
> Assignee: Weston Pace
> Priority: Major
>
> The dataset writer needs an LRU cache to keep track of open files so that it
> can respect a "max open files" property (see ARROW-12321). A synchronous
> LruCache implementation already exists but on eviction from the cache we need
> to wait until all pending writes have completed before we evict the item and
> open a new file. This ticket is to create an AsyncLruCache which will allow
> the creation of items and the eviction of items to be asynchronous.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)