I don't agree. I believe you can track the iterators/iterables that are created and freed by using weak references and reference queues (or other methods). Having a few people work 10x as hard to provide a good implementation is much better then having 100s or 1000s of users suffering through a more complicated API.
On Thu, May 10, 2018 at 3:44 PM Xinyu Liu <xinyuliu...@gmail.com> wrote: > Load/evict blocks will help reduce the cache memory footprint, but we > still won't be able to release the underlying resources. We can add > definitely heuristics to help release the resources as you mentioned, but > there is no accurate way to track all the iterators/iterables created and > free them up once not needed. I think while the API is aimed at nice user > experience, we should have the option to let users optimize their > performance if they choose to. Do you agree? > > Thanks, > Xinyu > > On Thu, May 10, 2018 at 3:25 PM, Lukasz Cwik <lc...@google.com> wrote: > >> Users won't reliably close/release the resources and forcing them to will >> make the user experience worse. >> It will make a lot more sense to use a file format which allows random >> access and use a cache to load/evict blocks of the state from memory. >> If that is not possible, use an iterable which frees the resource after a >> certain amount of inactivity or uses weak references. >> >> On Thu, May 10, 2018 at 3:07 PM Xinyu Liu <xinyuliu...@gmail.com> wrote: >> >>> Hi, folks, >>> >>> I'm in the middle of implementing the MapState and SetState in our Samza >>> runner. We noticed that the state returns the Java Iterable for reading >>> entries, keys, etc. For state backed by file-based kv store like rocksDb, >>> we need to be able to let users explicitly close iterator/iterable to >>> release the resources.Otherwise we have to load the iterable into memory so >>> we can safely close the underlying rocksDb iterator, similar to Flink's >>> implementation. But this won't work for states that don't fit into >>> memory. I chatted with Kenn and he also agrees we need this capability to >>> avoid bulk read/write. This seems to be a general use case and I'm >>> wondering if we can add the support to it? I am happy to contribute to this >>> if needed. Any feedback is highly appreciated. >>> >>> Thanks, >>> Xinyu >>> >> >