[jira] [Commented] (NIFI-5757) AvroRecordSetWriter synchronize every access to compiledAvroSchemaCache

ASF GitHub Bot (JIRA) Tue, 30 Oct 2018 09:55:13 -0700


    [ 
https://issues.apache.org/jira/browse/NIFI-5757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16669036#comment-16669036
 ]


ASF GitHub Bot commented on NIFI-5757:
--------------------------------------

Github user arkadius commented on the issue:

    https://github.com/apache/nifi/pull/3111
  
    Other places of usage:
    1. `org.apache.nifi.record.path.util.RecordPathCache` - record path cache, 
synchronized access
    2. `org.apache.nifi.avro.AvroReader` - avro schema cache (this parts is 
almost copy-paste of `AvroRecordSetWriter`), synchronized access
    3. `org.apache.nifi.processors.attributes.UpdateAttribute` - canonical 
value lookup cache, synchronized access
    4. `org.apache.nifi.atlas.hook.NotificationSender` - `guidToQualifiedName` 
and `typedQualifiedNameToRef` caches, but no synchronization on class level
    5. `org.apache.nifi.processors.jolt.record.JoltTransformRecord` - 
tranformations cache, synchronized access
    6. `org.apache.nifi.schema.access.WriteAvroSchemaAttributeStrategy` - avro 
schema cache, synchronized access
    7. `org.apache.nifi.processors.standard.PutDatabaseRecord` - table schema 
cache, synchronized access
    8. `org.apache.nifi.processors.standard.JoltTransformJSON` - tranformations 
cache, synchronized access
    9. `org.apache.nifi.processors.standard.ConvertJSONToSQL` - table schema 
cache, synchronized access
    10. 
`org.apache.nifi.confluent.schemaregistry.client.CachingSchemaRegistryClient` - 
avro schema cache, synchronized access
    
    For me in 9/10 case it is the same scenario and would be better to fix it 
there before someone hit the same problem. Especially when it is used for avro 
schema cache. If someone choose, avro it is very high probability that he or 
she will expect high throughputs. WDYT?


> AvroRecordSetWriter synchronize every access to compiledAvroSchemaCache
> -----------------------------------------------------------------------
>
>                 Key: NIFI-5757
>                 URL: https://issues.apache.org/jira/browse/NIFI-5757
>             Project: Apache NiFi
>          Issue Type: Improvement
>          Components: Core Framework
>    Affects Versions: 1.7.1
>            Reporter: Arek Burdach
>            Priority: Major
>
> Avro record serialization is a quite expensive operation.
> This stack trace I very often see in thread dumps:
> {noformat}
> Thread 48583: (state = BLOCKED)
>  - 
> org.apache.nifi.avro.AvroRecordSetWriter.compileAvroSchema(java.lang.String) 
> @bci=9, line=124 (Compiled frame)
>  - 
> org.apache.nifi.avro.AvroRecordSetWriter.createWriter(org.apache.nifi.logging.ComponentLog,
>  org.apache.nifi.serialization.record.RecordSchema, java.io.OutputStream) 
> @bci=96, line=92 (Compiled frame)
>  - sun.reflect.GeneratedMethodAccessor183.invoke(java.lang.Object, 
> java.lang.Object[]) @bci=56 (Compiled frame)
>  - sun.reflect.DelegatingMethodAccessorImpl.invoke(java.lang.Object, 
> java.lang.Object[]) @bci=6, line=43 (Compiled frame)
>  - java.lang.reflect.Method.invoke(java.lang.Object, java.lang.Object[]) 
> @bci=56, line=498 (Compiled frame)
>  - 
> org.apache.nifi.controller.service.StandardControllerServiceInvocationHandler.invoke(java.lang.Object,
>  java.lang.reflect.Method, java.lang.Object[]) @bci=309, line=89 (Compiled 
> frame)
>  - com.sun.proxy.$Proxy100.createWriter(org.apache.nifi.logging.ComponentLog, 
> org.apache.nifi.serialization.record.RecordSchema, java.io.OutputStream) 
> @bci=24 (Compiled frame)
>  - 
> org.apache.nifi.processors.kafka.pubsub.PublisherLease.publish(org.apache.nifi.flowfile.FlowFile,
>  org.apache.nifi.serialization.record.RecordSet, 
> org.apache.nifi.serialization.RecordSetWriterFactory, 
> org.apache.nifi.serialization.record.RecordSchema, java.lang.String, 
> java.lang.String) @bci=71, line=169 (Compiled frame)
>  - 
> org.apache.nifi.processors.kafka.pubsub.PublishKafkaRecord_1_0$1.process(java.io.InputStream)
>  @bci=94, line=412 (Compiled frame)
> {noformat}
> The reason why it happens is because {{AvroRecordSetWriter}} synchronizing 
> every access to cache of compiled schemas.
> I've prepared PR that is fixing this issue by using {{ConcurrentHashMap}} 
> instead: https://github.com/apache/nifi/pull/3111
> It is not a perfect fix because it removes cache size limitation which BTW 
> was hardcoded to {{20}}. Services can be reusable by many flows so such a 
> hard limit is not a good choice.
> What do you think about such an improvement?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (NIFI-5757) AvroRecordSetWriter synchronize every access to compiledAvroSchemaCache

Reply via email to