GitHub user jackylk opened a pull request:
https://github.com/apache/carbondata/pull/3066
[CARBONDATA-3244] Add benchmark for Change Data Capture scenario
CDC (change data capture) is a common scenario for analyzing slowly changed
table in data warehouse.
It is good to add benchmark test comparing two update methods:
1. hive_solution, which uses INSERT OVERWRITE. This is a popular method for
hive warehouse.
2. carbon_solution, which uses CarbonData's update syntax to update the
history table directly.
This test simulates updates to history table using CDC table.
When running in a 8-cores laptop, the benchmark shows:
1. test one
History table 1M records, update 10K records everyday and insert 10K
records everyday, simulated 3 days.
hive_solution: total process time takes 13,516 ms
carbon_solution: total process time takes 7,521 ms
2. test two
History table 10M records, update 10K records everyday and insert 10K
records everyday,
simulated 3 days.
hive_solution: total process time takes 104,250 ms
carbon_solution: total process time takes 17,384 ms
- [X] Any interfaces changed?
No
- [X] Any backward compatibility impacted?
No
- [X] Document update required?
No
- [X] Testing done
Please provide details on
- Whether new unit test cases have been added or why no new tests
are required?
- How it is tested? Please attach test report.
- Is it a performance related change? Please attach the performance
test report.
- Any additional information to help reviewers in testing this
change.
Only example is added
- [X] For large changes, please consider breaking it into sub-tasks under
an umbrella JIRA.
NA
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/jackylk/incubator-carbondata cdc
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/carbondata/pull/3066.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #3066
----
commit ebb5ef79ac85a6c736496fe19f719bfed74902c1
Author: Jacky Li <jacky.likun@...>
Date: 2019-01-10T16:44:58Z
add benchmark for Change Data Capture scenario
----
---