[
https://issues.apache.org/jira/browse/AVRO-3184?focusedWorklogId=646720&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-646720
]
ASF GitHub Bot logged work on AVRO-3184:
----------------------------------------
Author: ASF GitHub Bot
Created on: 05/Sep/21 20:59
Start Date: 05/Sep/21 20:59
Worklog Time Spent: 10m
Work Description: belugabehr commented on a change in pull request #1301:
URL: https://github.com/apache/avro/pull/1301#discussion_r702476115
##########
File path: lang/java/avro/src/main/java/org/apache/avro/generic/GenericData.java
##########
@@ -69,6 +69,16 @@
private static final GenericData INSTANCE = new GenericData();
+ private static final Map<Class<?>, String> PRIMATIVE_DATUM_TYPES = new
IdentityHashMap<>();
Review comment:
It's faster to use `IdentityHashMap` as it uses the primitive `==`
method instead of `equals`. When coming up with this solution, I did some
research and from everything I can find, it seems that as long as the classes
comes from the same class loader (as is the case here since it's populated by
the class itself and not an outside reference), then this should be perfectly
acceptable.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 646720)
Time Spent: 50m (was: 40m)
> Cache Datum Type Strings in Resolve Union
> -----------------------------------------
>
> Key: AVRO-3184
> URL: https://issues.apache.org/jira/browse/AVRO-3184
> Project: Apache Avro
> Issue Type: Improvement
> Reporter: David Mollitor
> Assignee: David Mollitor
> Priority: Major
> Labels: pull-request-available
> Attachments: AVRO-3184.JPG, AVRO-master.JPG
>
> Time Spent: 50m
> Remaining Estimate: 0h
>
> {code:java|title=GenericData.java}
> protected String getSchemaName(Object datum) {
> if (datum == null || datum == JsonProperties.NULL_VALUE)
> return Type.NULL.getName();
> if (isRecord(datum))
> return getRecordSchema(datum).getFullName();
> if (isEnum(datum))
> return getEnumSchema(datum).getFullName();
> if (isArray(datum))
> return Type.ARRAY.getName();
> if (isMap(datum))
> return Type.MAP.getName();
> if (isFixed(datum))
> return getFixedSchema(datum).getFullName();
> if (isString(datum))
> return Type.STRING.getName();
> if (isBytes(datum))
> return Type.BYTES.getName();
> if (isInteger(datum))
> return Type.INT.getName();
> if (isLong(datum))
> return Type.LONG.getName();
> if (isFloat(datum))
> return Type.FLOAT.getName();
> if (isDouble(datum))
> return Type.DOUBLE.getName();
> if (isBoolean(datum))
> return Type.BOOLEAN.getName();
> throw new AvroRuntimeException(String.format("Unknown datum type %s: %s",
> datum.getClass().getName(), datum));
> }
> {code}
> This is a lot of effort for each of the simple native types (Long, Float,
> Double, etc.) type. It is the last thing that is checked. Add a cache for
> these simple use cases.
> I came across this while examining performance of Apache ORC which includes
> an Avro benchmark for comparison. You can see the charts with the change
> implemented.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)