vburenin opened a new pull request #2440:
URL: https://github.com/apache/hudi/pull/2440
## What is the purpose of the pull request
Fixed suboptimal implementation of a magic sequence search that may take
days on the file sizes of a few megabytes.
Instead of using 6 bytes buffer to find a magic sequence it uses a lot
larger buffer that speeds up process like 170k times in some cases. The
inefficiency is very noticeable when GCS or S3 storages are begin used.
## Brief change log
Rewrote scanForNextAvailableBlockOffset function to use a large buffer size.
## Verify this pull request
This pull request is already covered by existing tests
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]