I don’t think a lot of thought went into how adapters use native objects.  For 
any given adapter we could start a discussion about to to manage the lifecycle 
(e.g. a pool or cache or factory).

Sent from my iPad

> On Apr 10, 2022, at 10:33 PM, James Turton <[email protected]> wrote:
> 
> Hi Calcite devs!
> 
> There are resource leaks affecting some Calcite adapters, including ES and 
> Cassandra and probably some others, whereby Calcite internally creates 
> clients objects from external libraries in order to access the system in 
> question and never closes those clients. The reason that there is no trivial 
> fix available is that it is only the application that knows when said clients 
> may be closed, and Calcite offers the application no way to signal this to it.
> 
> The case I have real world information about, ES, is a serious problem 
> because the resource leak is unbounded.  Here Calcite creates an ES 
> "RestClient" for every call to create() in the schema factory and the 
> RestClient leaks at least a file descriptor if it is not closed.  Operating 
> systems enforce a per-process file descriptor quota.  If your application, 
> Drill in our case, makes one too many calls to the ES schema factory's 
> create() method, then the JVM hosting it is summarily executed by the OS.
> 
> In the case of Cassandra, the situation looks less bad to me in that client 
> objects are reused by Calcite.  This means that if the application only ever 
> wants to talk to a finite distinct number of Cassandra endpoints then the 
> resource leak is bounded and, most likely, quite small.  In 
> https://github.com/apache/calcite/pull/2698, I've been revising a patch to 
> the ES schema factory to introduce the same sort of reuse to constrain, but 
> not cure, the resource leak there.  The patch is unquestionably a nasty "band 
> aid" and that prompted some discussion with its reviewers and a request that 
> I email this list.
> 
> I think Calcite might have to make a design decision, perhaps one that means 
> that either
> 
> * it abstains from connection management entirely in adapters, which
>   might break some of its public APIs since then applications must
>   start to pass connections in (or might they be smuggled inside the
>   operand Map?) or
> * it starts to use connections in single-use way, freeing them
>   immediately and taking a performance hit or
> * it makes Schema, or some other, objects closeable by the application
>   and propagates these events to the adapter code responsible for
>   managing connections.
> 
> Thanks
> 
> James

Reply via email to