[ 
https://issues.apache.org/jira/browse/CALCITE-3325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16956610#comment-16956610
 ] 

Mehdi Salarkia commented on CALCITE-3325:
-----------------------------------------

{quote}I think this is the source of my confusion. Server-side, we generally 
have a pool of threads which work incoming client requests. This is bounded to 
ensure that the server won't be overrun if there's an influx of clients. If, 
within a single thread, we're not getting the same instance of an 
UnsynchronizedBuffer, then this code doesn't work right.
{quote}
I thought this code is only used by the client but now I can see it is used in 
`AvaticaProtobufHandler` and `AvaticaJsonHandler`. But since only one instance 
of them is created on the server-side you won't have the problem I mentioned 
above in the server code and technically that works exactly the same if you 
define it as static(Referring to 
`org.apache.calcite.avatica.server.AvaticaProtobufHandler#threadLocalBuffer` 
which creates only 1 instance per thread).


{quote}The same _should_ apply client side, but there's more ability for you to 
shoot yourself in the foot (if you run a single operation on a thread and throw 
that thread away). Because you're creating a new Connection every time, you're 
throwing away any state we had for that thread, and re-creating everything. Why 
aren't you keeping that Connection around?
{quote}
We have a connection pool in the actual production code (Hikari) which 
maintains the connections and decide when to kill the connections. To be clear 
this issue only showed itself on the client side with the connection pool size 
of ~200-300 after running the service for multiple hours under heavy load. My 
code harness is an exasperated version of the issue to demonstrate the bug but 
certainly is not the exact code that we use in production.


{quote}I appreciate you sharing the client code. I'll have to try to find some 
time this week to make that functional (I don't have an avatica test harness 
ready to go right now), and can look at that. If you have the cycles, I'd be 
curious what your test shows if you changed it to do..
{quote}
I did run your code snippet and it just creates 40 threadLocal entries (as 
expected) but by the time you don't actually close and open new connections the 
symptoms are not showing as you would expect, because it's not filling up the 
ThreadBuffer and forcing a GC.

For now we have fixed the client side code by overriding the 
`threadLocalBuffer` to be static  and didn't see the issue or any data 
corruptions with a mass load of ~ 4.5 Billion records.  

Thanks again for looking into this and your quick response.

> Thread Local Buffer Variable (threadLocalBuffer) In ProtobufTranslationImpl 
> Is Defined As Non Static Causing Memory Leak
> ------------------------------------------------------------------------------------------------------------------------
>
>                 Key: CALCITE-3325
>                 URL: https://issues.apache.org/jira/browse/CALCITE-3325
>             Project: Calcite
>          Issue Type: Bug
>            Reporter: Mehdi Salarkia
>            Priority: Major
>         Attachments: Non-Static.snapshot, Non-static.png, Screen Shot 
> 2019-09-05 at 5.05.20 PM.png, Screen Shot 2019-09-05 at 5.18.19 PM.png, 
> Static.png, Static.snapshot
>
>
> As we were load testing our system on Apache Phoenix via the thin client 
> which uses Avatica we ran into Garbage collection problems. After some 
> investigation we could see there are a lot of unreferenced object due to this 
> variable: 
> {code:java}
> private final ThreadLocal<UnsynchronizedBuffer> threadLocalBuffer =
>     new ThreadLocal<UnsynchronizedBuffer>() {
>       @Override protected UnsynchronizedBuffer initialValue() {
>         return new UnsynchronizedBuffer();
>       }
>     };
> {code}
> Which seems to be a reusable buffer for serializing/deserializing the data.
> From my understating there is a copy of this variable created per each 
> instance of ProtobufTranslationImpl. However the proper use of ThreadLocal it 
> seems to be one instance per thread and the current implementation seems to 
> be missing the `static` keyword when defining the thread local variable:
>  See [https://docs.oracle.com/javase/7/docs/api/java/lang/ThreadLocal.html]
> {code:java}
> ThreadLocal instances are typically private static fields in classes that 
> wish to associate state with a thread (e.g., a user ID or Transaction ID).
> {code}
> See the attached snapshot from a memory dump we took.    



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to