[ 
https://issues.apache.org/jira/browse/CALCITE-3325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16955836#comment-16955836
 ] 

Mehdi Salarkia commented on CALCITE-3325:
-----------------------------------------

Sorry for the late response as I had been very busy lately.

Here is exactly a code example you can use to see the issue. It would require 
to profile your application (I used Yourkit) or whatever to see the excessive 
number of `org.apache.calcite.avatica.util.UnsynchronizedBuffer` instances.

 
{code:java}
public class AvaticaGC {
    private static String URL = "http://localhost:8765;serialization=PROTOBUF";;

    public static void main(String[] args) throws Exception {
        int threadCount = 40;

        ExecutorService executor = Executors.newFixedThreadPool(threadCount);
        for (int i = 0; i < threadCount; i++) {

            Runnable runner = new Runnable() {
                @Override
                public void run() {
                    while (true) {
                        try (Connection con = 
DriverManager.getConnection("jdbc:phoenix:thin:url=" + URL); PreparedStatement 
psmt = con.prepareStatement("SELECT 1")) {
                            ResultSet rs = psmt.executeQuery();
                            rs.next();
                            Thread.sleep(100);
                        } catch (Exception e) {
                            throw new RuntimeException(e);
                        }
                    }
                }
            };
            executor.execute(runner);
        }
        executor.shutdown();

    }
}
{code}
I saw a GC pause of ~1-2 sec after running this for a few minute (See 
attachments).
 

 
{quote}Finally, let me be super clear: marking the buffer as static is _very 
wrong_. As the name indicates, there is no synchronization of this buffer. If 
you are reusing this buffer, it's going to result in data corruption issues.
{quote}
I'm still having trouble to understand this. What I can see is :
1) A new instance of UnsynchronizedBuffer is created per connection creation.

2) UnsynchronizedBuffer is stored in thread local (and the previous instance is 
left on `ThreadLocalMap` :( ).

3) The UnsynchronizedBuffer is used during serialize/deserialize.

4) Finally  `UnsynchronizedBuffer.reset()` is invoked to make sure  the 
UnsynchronizedBuffer is ready for next invocation (which never happens because 
on next invocation a new instance will be created :( ). 
{code:java}
private final ThreadLocal<UnsynchronizedBuffer> threadLocalBuffer =
    new ThreadLocal<UnsynchronizedBuffer>() {
      @Override protected UnsynchronizedBuffer initialValue() {
        return new UnsynchronizedBuffer();
      }
    };

....

@Override public byte[] serializeResponse(Response response) throws IOException 
{
  // Avoid BAOS for its synchronized write methods, we don't need that 
concurrency control
  UnsynchronizedBuffer out = threadLocalBuffer.get();
  try {
    Message responseMsg = response.serialize();
    // Serialization of the response may be large
    if (LOG.isTraceEnabled()) {
      LOG.trace("Serializing response '{}'", 
TextFormat.shortDebugString(responseMsg));
    }
    serializeMessage(out, responseMsg);
    return out.toArray();
  } finally {
    out.reset();
  }
}

@Override public byte[] serializeRequest(Request request) throws IOException {
  // Avoid BAOS for its synchronized write methods, we don't need that 
concurrency control
  UnsynchronizedBuffer out = threadLocalBuffer.get();
  try {
    Message requestMsg = request.serialize();
    // Serialization of the request may be large
    if (LOG.isTraceEnabled()) {
      LOG.trace("Serializing request '{}'", 
TextFormat.shortDebugString(requestMsg));
    }
    serializeMessage(out, requestMsg);
    return out.toArray();
  } finally {
    out.reset();
  }
}{code}
With this code `UnsynchronizedBuffer` is stored at at threadLocalMap and *only* 
the same thread can have access to it, in another word the only chance that an 
instance of `UnsynchronizedBuffer` being re-used is scoped to the same thread 
(no concurrent invocation possible as a thread can serve one request at a 
time). This will run sequentially which is safe as the previous thread have 
already called the *reset* method to clear the buffer.

If you don't want to re-use `UnsynchronizedBuffer` and your logic is have an 
instance of `UnsynchronizedBuffer` per connection why should you even involve 
thread local?

I understand the motive of [this 
design|https://issues.apache.org/jira/browse/CALCITE-1094 ] was to prevent 
thread synchronization and lock contention but this only works until you hit a 
snag with memory and see a bad and long GC.

 

> Thread Local Buffer Variable (threadLocalBuffer) In ProtobufTranslationImpl 
> Is Defined As Non Static Causing Memory Leak
> ------------------------------------------------------------------------------------------------------------------------
>
>                 Key: CALCITE-3325
>                 URL: https://issues.apache.org/jira/browse/CALCITE-3325
>             Project: Calcite
>          Issue Type: Bug
>            Reporter: Mehdi Salarkia
>            Priority: Major
>         Attachments: Non-Static.snapshot, Non-static.png, Screen Shot 
> 2019-09-05 at 5.05.20 PM.png, Screen Shot 2019-09-05 at 5.18.19 PM.png, 
> Static.png, Static.snapshot
>
>
> As we were load testing our system on Apache Phoenix via the thin client 
> which uses Avatica we ran into Garbage collection problems. After some 
> investigation we could see there are a lot of unreferenced object due to this 
> variable: 
> {code:java}
> private final ThreadLocal<UnsynchronizedBuffer> threadLocalBuffer =
>     new ThreadLocal<UnsynchronizedBuffer>() {
>       @Override protected UnsynchronizedBuffer initialValue() {
>         return new UnsynchronizedBuffer();
>       }
>     };
> {code}
> Which seems to be a reusable buffer for serializing/deserializing the data.
> From my understating there is a copy of this variable created per each 
> instance of ProtobufTranslationImpl. However the proper use of ThreadLocal it 
> seems to be one instance per thread and the current implementation seems to 
> be missing the `static` keyword when defining the thread local variable:
>  See [https://docs.oracle.com/javase/7/docs/api/java/lang/ThreadLocal.html]
> {code:java}
> ThreadLocal instances are typically private static fields in classes that 
> wish to associate state with a thread (e.g., a user ID or Transaction ID).
> {code}
> See the attached snapshot from a memory dump we took.    



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to