iemejia commented on code in PR #3725:
URL: https://github.com/apache/avro/pull/3725#discussion_r3170249726
##########
lang/java/avro/src/main/java/org/apache/avro/generic/GenericDatumReader.java:
##########
@@ -384,6 +433,73 @@ protected void addToMap(Object map, Object key, Object
value) {
((Map) map).put(key, value);
}
+ /**
+ * Returns the minimum number of bytes required to encode a single value of
the
+ * given schema in Avro binary format. Used to validate that the decoder has
+ * enough data remaining before allocating collection backing arrays.
+ * <p>
+ * Returns 0 for types whose binary encoding is empty ({@code null},
zero-length
+ * {@code fixed}, records with only zero-byte fields). Returns a positive
value
+ * for all other types.
+ */
+ static int minBytesPerElement(Schema schema) {
+ return minBytesPerElement(schema, Collections.newSetFromMap(new
IdentityHashMap<>()));
+ }
+
+ private static int minBytesPerElement(Schema schema, Set<Schema> visited) {
+ switch (schema.getType()) {
+ case NULL:
+ return 0;
+ case FIXED:
+ return schema.getFixedSize();
+ case FLOAT:
+ return 4;
+ case DOUBLE:
+ return 8;
+ case RECORD:
+ if (!visited.add(schema)) {
+ return 0; // break recursion for self-referencing schemas
+ }
+ long sum = 0;
+ for (Schema.Field f : schema.getFields()) {
Review Comment:
I added a JMH benchmark test to measure the impact of this for wide and
deeply nested structures the results are promising apparently the extra cost is
negligible.
https://gist.github.com/iemejia/bae3302ec0f3d2abf92e99911ccba606
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]