prashantwason commented on a change in pull request #2128:
URL: https://github.com/apache/hudi/pull/2128#discussion_r499827933



##########
File path: 
hudi-integ-test/src/main/java/org/apache/hudi/integ/testsuite/generator/GenericRecordFullPayloadGenerator.java
##########
@@ -333,23 +312,37 @@ private int getSize(Schema elementSchema) {
    * @param elementSchema
    * @return Number of entries to add
    */
-  private int numEntriesToAdd(Schema elementSchema) {
-    // Find the size of the primitive data type in bytes
-    int primitiveDataTypeSize = getSize(elementSchema);
-    int numEntriesToAdd = numberOfBytesToAdd / primitiveDataTypeSize;
-    // If more than 10 entries are being added for this same complex field and 
there are still more complex fields to
-    // be visited in the schema, reduce the number of entries to add by a 
factor of 10 to allow for other complex
-    // fields to pack some entries
-    if (numEntriesToAdd % 10 > 0 && this.numberOfComplexFields > 1) {
-      numEntriesToAdd = numEntriesToAdd / 10;
-      numberOfBytesToAdd -= numEntriesToAdd * primitiveDataTypeSize;
-      this.shouldAddMore = true;
-    } else {
-      this.numberOfBytesToAdd = 0;
-      this.shouldAddMore = false;
+  private void determineExtraEntriesRequired(int numberOfComplexFields, int 
numberOfBytesToAdd) {
+    for (Schema.Field f : baseSchema.getFields()) {
+      Schema elementSchema = f.schema();
+      // Find the size of the primitive data type in bytes
+      int primitiveDataTypeSize = 0;
+      if (elementSchema.getType() == Type.ARRAY && 
isPrimitive(elementSchema.getElementType())) {
+        primitiveDataTypeSize = getSize(elementSchema.getElementType());
+      } else if (elementSchema.getType() == Type.MAP && 
isPrimitive(elementSchema.getValueType())) {
+        primitiveDataTypeSize = getSize(elementSchema.getValueType());
+      } else {
+        continue;
+      }
+
+      int numEntriesToAdd = numberOfBytesToAdd / primitiveDataTypeSize;
+      // If more than 10 entries are being added for this same complex field 
and there are still more complex fields to
+      // be visited in the schema, reduce the number of entries to add by a 
factor of 10 to allow for other complex
+      // fields to pack some entries
+      if (numEntriesToAdd > 10 && numberOfComplexFields > 1) {

Review comment:
       The idea here is to distribute the extra-entries (ins some way) across 
all complex fields.
   
   -  If there is only one complex field then all extra bytes will be added to 
that single entry. 
   
   int numEntriesToAdd = numberOfBytesToAdd / primitiveDataTypeSize;
   ...
   extraEntriesMap.put(f.name(), numEntriesToAdd);
   
   - If there are more than 1 complex field, then we add atleast 10 entries to 
the first one, and so on....
   
   The perfect solution is to equally divide the extra bytes across all complex 
fields but I have not covered that yet. (slightly complicated as each field may 
have different size, etc).
    
    




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to