[ https://issues.apache.org/jira/browse/ARROW-13644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17400247#comment-17400247 ]
Weston Pace commented on ARROW-13644: ------------------------------------- I guess I'm not sure I follow. Let's assume the user only wants two files open (obviously it should never be this low). A batch comes in that is partitioned on files X and Y. Then a batch comes in that is partitioned on files X and Z. I want to close file Y before I open file Z. With a semaphore I have something like... {code:python} def queue_write(f, batch): # actual write release(1) for f, batch in files: acquire(1) queue_write(f, batch) {code} It seems to me that the semaphore would just block indefinitely. File Y will not close itself, even when it finishes writing (files are held open in case more data comes for that file). > [C++] Create LruCache that works with futures > --------------------------------------------- > > Key: ARROW-13644 > URL: https://issues.apache.org/jira/browse/ARROW-13644 > Project: Apache Arrow > Issue Type: Sub-task > Components: C++ > Reporter: Weston Pace > Assignee: Weston Pace > Priority: Major > > The dataset writer needs an LRU cache to keep track of open files so that it > can respect a "max open files" property (see ARROW-12321). A synchronous > LruCache implementation already exists but on eviction from the cache we need > to wait until all pending writes have completed before we evict the item and > open a new file. This ticket is to create an AsyncLruCache which will allow > the creation of items and the eviction of items to be asynchronous. -- This message was sent by Atlassian Jira (v8.3.4#803005)