prashantwason opened a new pull request #2197:
URL: https://github.com/apache/hudi/pull/2197
## What is the purpose of the pull request
Please see HUDI-1351 for description of the issues that are being fixed here.
## Brief change log
1. Added the --clean-input and --clean-output parameters to clean the input
and output directories before starting the job
2. Added the --delete-old-input parameter to deleted older batches for data
already ingested. This helps keep number of redundant files low.
3. Added the --input-parallelism parameter to restrict the parallelism when
generating input data. This helps keeping the number of generated input files
low.
4. Added an option start_offset to Dag Nodes. Without ability to specify
start offsets, data is generated into existing partitions. With start offset,
DAG can control on which partition, the data is to be written.
5. Fixed generation of records for correct number of partitions
- In the existing implementation, the partition is chosen as a random
long. This does not guarantee exact number of requested partitions to be
created.
6. Changed variable blacklistedFields to be a Set as that is faster than
List for membership checks.
7. Fixed integer division for Math.ceil. If two integers are divided, the
result is not double unless one of the integer is casted to double.
## Verify this pull request
This pull request is already covered by existing tests, such as *(please
describe tests)*.
## Committer checklist
- [ ] Has a corresponding JIRA in PR title & commit
- [ ] Commit message is descriptive of the change
- [ ] CI is green
- [ ] Necessary doc changes done or have another open PR
- [ ] For large changes, please consider breaking it into sub-tasks under
an umbrella JIRA.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]