GitHub user xubo245 opened a pull request:
https://github.com/apache/carbondata/pull/2816
[CARBONDATA-300] Suppor read batch row in CSDK
[CARBONDATA-300] Suppor read batch row in CSDK
1. support read batch row in SDK
2. support read batch row in CSDK
3. add SDKReaderBenchmark IN SDK and testNextBatchRowPerformance in CSDK
4. improve CSDK read performance
This PR based on https://github.com/apache/carbondata/pull/2792 and cherry
pick its commits. After PR2792 merged , this PR will remove its commits.
For SDK batch read:
readNextBatchRow:
total lines is 200100000, build time is 10.434133262 s, total read time
is 167.567157044 s, average speed is 1194148.0868321797records/s.
readNextCarbonRow:
total lines is 200100000, build time is 15.775965656 s, total read time
is 183.312544655 s, average speed is 1091578.322567037records/s.
read batch row is faster 9.4% than readCarbonRow( one by one)
For CSDK:
Test next Row Performance:
build time is: 2.749129 s
100000: time is 0.147732 s, speed is 676901.416078 records/s
[email protected]
[email protected] from_to
<5164240.1075855667637.JavaMail.evans@thyme> 1538015558000000
971703720000000
200000: time is 0.320773 s, speed is 311746.936307 records/s
[email protected]
[email protected] from_to
<14154714.1075858633174.JavaMail.evans@thyme> 1538015558000000
1003768608000000
300000: time is 0.138412 s, speed is 722480.709765 records/s
[email protected] [email protected] from_to
<5977904.1075858636257.JavaMail.evans@thyme> 1538015558000000
1004057196000000
400000: time is 0.381501 s, speed is 262122.510819 records/s
[email protected] [email protected]
from_to <23732985.1075855665438.JavaMail.evans@thyme> 1538015558000000
976725540000000
500000: time is 0.124684 s, speed is 802027.525585 records/s
[email protected] [email protected]
from_to <31706076.1075858632278.JavaMail.evans@thyme> 1538015558000000
1003441879000000
600000: time is 1.260054 s, speed is 79361.678150 records/s
[email protected]
[email protected] from_to
<14154714.1075858633174.JavaMail.evans@thyme> 1538015558000000
1003768608000000
700000: time is 0.120333 s, speed is 831027.232762 records/s
from_email11347ryan.o'[email protected] [email protected]
from_to <2047280.1075858635378.JavaMail.evans@thyme> 1538015558000000
1003974318000000
800000: time is 0.424332 s, speed is 235664.526833 records/s
from_email11540ryan.o'[email protected]
[email protected] from_to
<2047280.1075858635378.JavaMail.evans@thyme> 1538015558000000
1003974318000000
900000: time is 0.127125 s, speed is 786627.335300 records/s
[email protected] [email protected]
from_to <14154714.1075858633174.JavaMail.evans@thyme> 1538015558000000
1003768608000000
1000000: time is 0.135605 s, speed is 737435.935253 records/s
[email protected] [email protected]
from_to <14154714.1075858633174.JavaMail.evans@thyme> 1538015558000000
1003768608000000
1100000: time is 0.653121 s, speed is 153110.985560 records/s
[email protected] [email protected]
from_to <12338129.1075855667248.JavaMail.evans@thyme> 1538015558000000
972994320000000
readNextBatchRow
log4j:WARN No appenders could be found for logger
(org.apache.carbondata.core.util.CarbonProperties).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for
more info.
build time is 10.434133262
100000: time is 0.015234857 s, speed is 6563894.88920047records/s,
hasNext time is 1.872E-5s [email protected]
[email protected] from_to
<5164240.1075855667637.JavaMail.evans@thyme> 1538015558000000
971703720000000
200000: time is 1.381838498 s, speed is 72367.35707156423records/s,
hasNext time is 1.373877655s [email protected]
[email protected] from_to
<14154714.1075858633174.JavaMail.evans@thyme> 1538015558000000
1003768608000000
300000: time is 0.071597049 s, speed is 1396705.6100314974records/s,
hasNext time is 0.068254875s [email protected]
[email protected] from_to
<5977904.1075858636257.JavaMail.evans@thyme> 1538015558000000
1004057196000000
400000: time is 0.071777167 s, speed is 1393200.7096351406records/s,
hasNext time is 0.069637177s [email protected]
[email protected] from_to
<23732985.1075855665438.JavaMail.evans@thyme> 1538015558000000
976725540000000
500000: time is 0.227270961 s, speed is 440003.4195305752records/s,
hasNext time is 0.225358746s [email protected]
[email protected] from_to
<31706076.1075858632278.JavaMail.evans@thyme> 1538015558000000
1003441879000000
600000: time is 0.069326305 s, speed is 1442453.9141383634records/s,
hasNext time is 0.06744768s [email protected]
[email protected] from_to
<14154714.1075858633174.JavaMail.evans@thyme> 1538015558000000
1003768608000000
700000: time is 0.07079448 s, speed is 1412539.508730059records/s,
hasNext time is 0.068803357s from_email11347ryan.o'[email protected]
[email protected] from_to
<2047280.1075858635378.JavaMail.evans@thyme> 1538015558000000
1003974318000000
800000: time is 0.147471892 s, speed is 678095.3213782597records/s,
hasNext time is 0.145297739s from_email11540ryan.o'[email protected]
[email protected] from_to
<2047280.1075858635378.JavaMail.evans@thyme> 1538015558000000
1003974318000000
900000: time is 0.073139928 s, speed is 1367242.2537796318records/s,
hasNext time is 0.070579908s [email protected]
[email protected] from_to
<14154714.1075858633174.JavaMail.evans@thyme> 1538015558000000
1003768608000000
1000000: time is 0.073197467 s, speed is 1366167.493200277records/s,
hasNext time is 0.071379687s [email protected]
[email protected] from_to
<14154714.1075858633174.JavaMail.evans@thyme> 1538015558000000
1003768608000000
1100000: time is 0.141830179 s, speed is 705068.5594918412records/s,
hasNext time is 0.140102684s [email protected]
[email protected] from_to
<12338129.1075855667248.JavaMail.evans@thyme> 1538015558000000
972994320000000
Be sure to do all of the following checklist to help us incorporate
your contribution quickly and easily:
- [ ] Any interfaces changed?
- [ ] Any backward compatibility impacted?
- [ ] Document update required?
- [ ] Testing done
Please provide details on
- Whether new unit test cases have been added or why no new tests
are required?
- How it is tested? Please attach test report.
- Is it a performance related change? Please attach the performance
test report.
- Any additional information to help reviewers in testing this
change.
- [ ] For large changes, please consider breaking it into sub-tasks under
an umbrella JIRA.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/xubo245/carbondata
CARBONDATA-3003_supportBatchRow
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/carbondata/pull/2816.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #2816
----
commit 425e76333ddb0991799f2fc7c0c028a18aca58b5
Author: xubo245 <xubo29@...>
Date: 2018-10-09T09:58:48Z
[CARBONDATA-2981] Support read primitive data type in CSDK
1.support readNextCarbonRow
2.support read different primitive data type in c code from
java side: int double short long string
3.support some data type and convert: date timestamp varchar
decimal array<T>
4.remove readNextStringRow
remove the file after finished run
change the file
commit 4fc5ce599ada4875337c88f5eb8d217a8ae73ddd
Author: xubo245 <xubo29@...>
Date: 2018-10-11T02:12:20Z
remove timestamp check
commit d74ce01c499b5b031d8123ea4dfc0cd90e56a2e8
Author: xubo245 <xubo29@...>
Date: 2018-10-16T03:02:07Z
[CARBONDATA-300] Suppor read batch row in CSDK
1. support read batch row in SDK
2. support read batch row in CSDK
3. add SDKReaderBenchmark IN SDK and testNextBatchRowPerformance in CSDK
4. improve CSDK read performance
----
---