GitHub user jackylk opened a pull request:

    https://github.com/apache/carbondata/pull/3066

    [CARBONDATA-3244] Add benchmark for Change Data Capture scenario

    CDC (change data capture) is a common scenario for analyzing slowly changed 
table in data warehouse.
    It is good to add benchmark test comparing two update methods:
    1. hive_solution, which uses INSERT OVERWRITE. This is a popular method for 
hive warehouse.
    2. carbon_solution, which uses CarbonData's update syntax to update the 
history table directly.
    
    This test simulates updates to history table using CDC table.
    When running in a 8-cores laptop, the benchmark shows:
    1. test one
    History table 1M records, update 10K records everyday and insert 10K 
records everyday, simulated 3 days.
    hive_solution: total process time takes 13,516 ms
    carbon_solution: total process time takes 7,521 ms
    
    
    2. test two
    History table 10M records, update 10K records everyday and insert 10K 
records everyday,
    simulated 3 days.
    hive_solution: total process time takes 104,250 ms
    carbon_solution: total process time takes 17,384 ms
    
     - [X] Any interfaces changed?
     No
     - [X] Any backward compatibility impacted?
     No
     - [X] Document update required?
    No
     - [X] Testing done
            Please provide details on 
            - Whether new unit test cases have been added or why no new tests 
are required?
            - How it is tested? Please attach test report.
            - Is it a performance related change? Please attach the performance 
test report.
            - Any additional information to help reviewers in testing this 
change.
      Only example is added     
     - [X] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 
    NA


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/jackylk/incubator-carbondata cdc

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/carbondata/pull/3066.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #3066
    
----
commit ebb5ef79ac85a6c736496fe19f719bfed74902c1
Author: Jacky Li <jacky.likun@...>
Date:   2019-01-10T16:44:58Z

    add benchmark for Change Data Capture scenario

----


---

Reply via email to