clownxc commented on PR #8472:
URL: https://github.com/apache/hudi/pull/8472#issuecomment-1537272950

   According to the suggestion provided by @prashantwason , I did a test as 
follows:
   ```java
       WriteStatus status = new WriteStatus(true, 1.0);
       String partitionPath = 
HoodieTestDataGenerator.DEFAULT_PARTITION_PATHS[0];
       dataGen = new HoodieTestDataGenerator(new String[] {partitionPath});
       String newCommitTime = "001";
       List<HoodieRecord> records = dataGen.generateInserts(newCommitTime, 
1000);
       Throwable t = new Exception("some error in writing");
       for (int i = 0; i < 1000 ; i++) {
         HoodieRecord data1 = records.get(i);
         status.markSuccess(data1, Option.empty());
         data1.deflate();
         HoodieRecord data2 = records.get(i++);
         status.markFailure(data2, t, Option.empty());
         data2.deflate();
       }
       System.out.println("status memory: " + 
ObjectSizeCalculator.getObjectSize(status));
   ```
   
   
   It was found that the memory space occupation before(status memory: 113048) 
and after optimization(status memory: 117032) basically did not change, The 
main reason is that `hoodie write handles deflate the HoodieRecord after 
writing` and `the mock class which may have bloating`  (I'm sorry because I 
didn't take these two factors into account in the previous test)
   @prashantwason @danny0405 @vinothchandar 
   
   I have a doubt that if there is some optimization needed for 
`writeStatus.markFailure`  if an exception occurs before `record.deflate()` 
   
   ```java
         writeStatus.markSuccess(hoodieRecord, recordMetadata);
         // deflate record payload after recording success. This will help 
users access payload as a
         // part of marking
         // record successful.
         hoodieRecord.deflate();
         return finalRecordOpt;
       } catch (Exception e) {
         LOG.error("Error writing record  " + hoodieRecord, e);
         writeStatus.markFailure(hoodieRecord, e, recordMetadata);
       }
   ```
   or, In some places, there will be no `deflate` operation when 
`writeStatus.markFailure` 
   ```java
       if (indexedRecord.isPresent()) {
         // Skip the ignored record.
         try {
           if (!indexedRecord.get().shouldIgnore(writeSchema, 
recordProperties)) {
             recordList.add(indexedRecord.get());
           }
         } catch (IOException e) {
           writeStatus.markFailure(record, e, record.getMetadata());
           LOG.error("Error writing record  " + indexedRecord.get(), e);
         }
       }
   ```
   
   Although the optimized effect may not have a large benefit
   ```java
     public void markFailure(HoodieRecord record, Throwable t, 
Option<Map<String, String>> optionalRecordMetadata) {
       if (failedRecords.isEmpty() || (random.nextDouble() <= failureFraction)) 
{
         // Guaranteed to have at-least one error
         failedRecords.add(record);
         errors.put(record.getKey(), t);
       }
       totalRecords++;
       totalErrorRecords++;
     }
   ```
   
   hope you leave some comments in your free time. @prashantwason @danny0405 
@vinothchandar 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to