garyli1019 edited a comment on issue #1890:
URL: https://github.com/apache/hudi/issues/1890#issuecomment-667465216


   This issue happened to me again. Now the cause could be narrowed down.
   When the log file was larger than 
`HoodieStorageConfig.LOGFILE_SIZE_MAX_BYTES`(1GB in default), the log file will 
be split into two files and the total size of the two log files is larger than 
2GB. When loading these two splits, this issue happened. 
   ~~My guess was the serializer has been reset after loading the first file~~ 
Created a ticket to track this https://issues.apache.org/jira/browse/HUDI-1141
   EDIT: Looks like this could be an integer overflow issue. 
https://github.com/apache/hudi/blame/master/hudi-common/src/main/java/org/apache/hudi/common/util/collection/DiskBasedMap.java#L354
   `Integer.MAXVALUE` is ~2GB. The file size fields in the relevant classes are 
all `Integer`.
   In my test, some log groups are larger than 2GB, so the smaller log file 
group is fine but large ones were failing. 
   @bvaradar What do you think? Should we fix this or we should just avoid 
having such a large log? 
   This kind of large log file is unusual because I was just stress-testing the 
merging.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to