[
https://issues.apache.org/jira/browse/NIFI-5757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16669036#comment-16669036
]
ASF GitHub Bot commented on NIFI-5757:
--------------------------------------
Github user arkadius commented on the issue:
https://github.com/apache/nifi/pull/3111
Other places of usage:
1. `org.apache.nifi.record.path.util.RecordPathCache` - record path cache,
synchronized access
2. `org.apache.nifi.avro.AvroReader` - avro schema cache (this parts is
almost copy-paste of `AvroRecordSetWriter`), synchronized access
3. `org.apache.nifi.processors.attributes.UpdateAttribute` - canonical
value lookup cache, synchronized access
4. `org.apache.nifi.atlas.hook.NotificationSender` - `guidToQualifiedName`
and `typedQualifiedNameToRef` caches, but no synchronization on class level
5. `org.apache.nifi.processors.jolt.record.JoltTransformRecord` -
tranformations cache, synchronized access
6. `org.apache.nifi.schema.access.WriteAvroSchemaAttributeStrategy` - avro
schema cache, synchronized access
7. `org.apache.nifi.processors.standard.PutDatabaseRecord` - table schema
cache, synchronized access
8. `org.apache.nifi.processors.standard.JoltTransformJSON` - tranformations
cache, synchronized access
9. `org.apache.nifi.processors.standard.ConvertJSONToSQL` - table schema
cache, synchronized access
10.
`org.apache.nifi.confluent.schemaregistry.client.CachingSchemaRegistryClient` -
avro schema cache, synchronized access
For me in 9/10 case it is the same scenario and would be better to fix it
there before someone hit the same problem. Especially when it is used for avro
schema cache. If someone choose, avro it is very high probability that he or
she will expect high throughputs. WDYT?
> AvroRecordSetWriter synchronize every access to compiledAvroSchemaCache
> -----------------------------------------------------------------------
>
> Key: NIFI-5757
> URL: https://issues.apache.org/jira/browse/NIFI-5757
> Project: Apache NiFi
> Issue Type: Improvement
> Components: Core Framework
> Affects Versions: 1.7.1
> Reporter: Arek Burdach
> Priority: Major
>
> Avro record serialization is a quite expensive operation.
> This stack trace I very often see in thread dumps:
> {noformat}
> Thread 48583: (state = BLOCKED)
> -
> org.apache.nifi.avro.AvroRecordSetWriter.compileAvroSchema(java.lang.String)
> @bci=9, line=124 (Compiled frame)
> -
> org.apache.nifi.avro.AvroRecordSetWriter.createWriter(org.apache.nifi.logging.ComponentLog,
> org.apache.nifi.serialization.record.RecordSchema, java.io.OutputStream)
> @bci=96, line=92 (Compiled frame)
> - sun.reflect.GeneratedMethodAccessor183.invoke(java.lang.Object,
> java.lang.Object[]) @bci=56 (Compiled frame)
> - sun.reflect.DelegatingMethodAccessorImpl.invoke(java.lang.Object,
> java.lang.Object[]) @bci=6, line=43 (Compiled frame)
> - java.lang.reflect.Method.invoke(java.lang.Object, java.lang.Object[])
> @bci=56, line=498 (Compiled frame)
> -
> org.apache.nifi.controller.service.StandardControllerServiceInvocationHandler.invoke(java.lang.Object,
> java.lang.reflect.Method, java.lang.Object[]) @bci=309, line=89 (Compiled
> frame)
> - com.sun.proxy.$Proxy100.createWriter(org.apache.nifi.logging.ComponentLog,
> org.apache.nifi.serialization.record.RecordSchema, java.io.OutputStream)
> @bci=24 (Compiled frame)
> -
> org.apache.nifi.processors.kafka.pubsub.PublisherLease.publish(org.apache.nifi.flowfile.FlowFile,
> org.apache.nifi.serialization.record.RecordSet,
> org.apache.nifi.serialization.RecordSetWriterFactory,
> org.apache.nifi.serialization.record.RecordSchema, java.lang.String,
> java.lang.String) @bci=71, line=169 (Compiled frame)
> -
> org.apache.nifi.processors.kafka.pubsub.PublishKafkaRecord_1_0$1.process(java.io.InputStream)
> @bci=94, line=412 (Compiled frame)
> {noformat}
> The reason why it happens is because {{AvroRecordSetWriter}} synchronizing
> every access to cache of compiled schemas.
> I've prepared PR that is fixing this issue by using {{ConcurrentHashMap}}
> instead: https://github.com/apache/nifi/pull/3111
> It is not a perfect fix because it removes cache size limitation which BTW
> was hardcoded to {{20}}. Services can be reusable by many flows so such a
> hard limit is not a good choice.
> What do you think about such an improvement?
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)