iemejia commented on code in PR #3725:
URL: https://github.com/apache/avro/pull/3725#discussion_r3170249726


##########
lang/java/avro/src/main/java/org/apache/avro/generic/GenericDatumReader.java:
##########
@@ -384,6 +433,73 @@ protected void addToMap(Object map, Object key, Object 
value) {
     ((Map) map).put(key, value);
   }
 
+  /**
+   * Returns the minimum number of bytes required to encode a single value of 
the
+   * given schema in Avro binary format. Used to validate that the decoder has
+   * enough data remaining before allocating collection backing arrays.
+   * <p>
+   * Returns 0 for types whose binary encoding is empty ({@code null}, 
zero-length
+   * {@code fixed}, records with only zero-byte fields). Returns a positive 
value
+   * for all other types.
+   */
+  static int minBytesPerElement(Schema schema) {
+    return minBytesPerElement(schema, Collections.newSetFromMap(new 
IdentityHashMap<>()));
+  }
+
+  private static int minBytesPerElement(Schema schema, Set<Schema> visited) {
+    switch (schema.getType()) {
+    case NULL:
+      return 0;
+    case FIXED:
+      return schema.getFixedSize();
+    case FLOAT:
+      return 4;
+    case DOUBLE:
+      return 8;
+    case RECORD:
+      if (!visited.add(schema)) {
+        return 0; // break recursion for self-referencing schemas
+      }
+      long sum = 0;
+      for (Schema.Field f : schema.getFields()) {

Review Comment:
   I added a JMH benchmark test to measure the impact of this for wide and 
deeply nested structures the results are promising apparently the extra cost is 
negligible.
   https://gist.github.com/iemejia/bae3302ec0f3d2abf92e99911ccba606



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to