[ 
https://issues.apache.org/jira/browse/NIFI-5757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16667412#comment-16667412
 ] 

ASF GitHub Bot commented on NIFI-5757:
--------------------------------------

Github user markap14 commented on the issue:

    https://github.com/apache/nifi/pull/3111
  
    @arkadius thanks for the contribution. I don't think this is the right 
approach, however, As-is, the code ensures that we don't hold any more than a 
set number of entries in the cache. The proposed change removes this 
limitation, which introduces a memory leak that can quickly cause a system to 
run out of memory if they are using an unbounded number of schemas (as can 
occur if a system is dynamically generating schemas).
    
    The synchronization here is limited to an extremely small scope. Can you 
explain how you arrived at the determination that the existing code is slow? 
What kind of performance are you seeing? What performance do you expect? What 
kind of difference do you see after applying this change?


> AvroRecordSetWriter synchronize every access to compiledAvroSchemaCache
> -----------------------------------------------------------------------
>
>                 Key: NIFI-5757
>                 URL: https://issues.apache.org/jira/browse/NIFI-5757
>             Project: Apache NiFi
>          Issue Type: Improvement
>          Components: Core Framework
>    Affects Versions: 1.7.1
>            Reporter: Arek Burdach
>            Priority: Major
>
> Avro record serialization is a quite expensive operation.
> This stack trace I very often see in thread dumps:
> {noformat}
> Thread 48583: (state = BLOCKED)
>  - 
> org.apache.nifi.avro.AvroRecordSetWriter.compileAvroSchema(java.lang.String) 
> @bci=9, line=124 (Compiled frame)
>  - 
> org.apache.nifi.avro.AvroRecordSetWriter.createWriter(org.apache.nifi.logging.ComponentLog,
>  org.apache.nifi.serialization.record.RecordSchema, java.io.OutputStream) 
> @bci=96, line=92 (Compiled frame)
>  - sun.reflect.GeneratedMethodAccessor183.invoke(java.lang.Object, 
> java.lang.Object[]) @bci=56 (Compiled frame)
>  - sun.reflect.DelegatingMethodAccessorImpl.invoke(java.lang.Object, 
> java.lang.Object[]) @bci=6, line=43 (Compiled frame)
>  - java.lang.reflect.Method.invoke(java.lang.Object, java.lang.Object[]) 
> @bci=56, line=498 (Compiled frame)
>  - 
> org.apache.nifi.controller.service.StandardControllerServiceInvocationHandler.invoke(java.lang.Object,
>  java.lang.reflect.Method, java.lang.Object[]) @bci=309, line=89 (Compiled 
> frame)
>  - com.sun.proxy.$Proxy100.createWriter(org.apache.nifi.logging.ComponentLog, 
> org.apache.nifi.serialization.record.RecordSchema, java.io.OutputStream) 
> @bci=24 (Compiled frame)
>  - 
> org.apache.nifi.processors.kafka.pubsub.PublisherLease.publish(org.apache.nifi.flowfile.FlowFile,
>  org.apache.nifi.serialization.record.RecordSet, 
> org.apache.nifi.serialization.RecordSetWriterFactory, 
> org.apache.nifi.serialization.record.RecordSchema, java.lang.String, 
> java.lang.String) @bci=71, line=169 (Compiled frame)
>  - 
> org.apache.nifi.processors.kafka.pubsub.PublishKafkaRecord_1_0$1.process(java.io.InputStream)
>  @bci=94, line=412 (Compiled frame)
> {noformat}
> The reason why it happens is because {{AvroRecordSetWriter}} synchronizing 
> every access to cache of compiled schemas.
> I've prepared PR that is fixing this issue by using {{ConcurrentHashMap}} 
> instead: https://github.com/apache/nifi/pull/3111
> It is not a perfect fix because it removes cache size limitation which BTW 
> was hardcoded to {{20}}. Services can be reusable by many flows so such a 
> hard limit is not a good choice.
> What do you think about such an improvement?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to