GitHub user xubo245 opened a pull request:

    https://github.com/apache/carbondata/pull/2816

     [CARBONDATA-300] Suppor read batch row in CSDK

     [CARBONDATA-300] Suppor read batch row in CSDK
        1. support read batch row in SDK
        2. support read batch row in CSDK
        3. add SDKReaderBenchmark IN SDK and testNextBatchRowPerformance in CSDK
        4. improve CSDK read performance
    
    This PR based on https://github.com/apache/carbondata/pull/2792 and cherry 
pick its commits. After PR2792 merged , this PR will remove its commits.
    
    For SDK batch read:
    readNextBatchRow:
    
    total lines is 200100000, build time is 10.434133262 s,     total read time 
is 167.567157044 s,     average speed is 1194148.0868321797records/s.
    
    readNextCarbonRow:
    
    total lines is 200100000, build time is 15.775965656 s,     total read time 
is 183.312544655 s,     average speed is 1091578.322567037records/s.
    read batch row is faster 9.4% than readCarbonRow( one by one)
    
    For CSDK:
    
    Test next Row Performance:
    
    
    build time is: 2.749129 s
    
    100000: time is 0.147732 s, speed is 676901.416078 records/s  
[email protected]        
[email protected] from_to 
<5164240.1075855667637.JavaMail.evans@thyme>    1538015558000000        
971703720000000 
    200000: time is 0.320773 s, speed is 311746.936307 records/s  
[email protected]    
[email protected] from_to 
<14154714.1075858633174.JavaMail.evans@thyme>   1538015558000000        
1003768608000000        
    300000: time is 0.138412 s, speed is 722480.709765 records/s  
[email protected]  [email protected]   from_to 
<5977904.1075858636257.JavaMail.evans@thyme>    1538015558000000        
1004057196000000        
    400000: time is 0.381501 s, speed is 262122.510819 records/s  
[email protected] [email protected]   
from_to <23732985.1075855665438.JavaMail.evans@thyme>   1538015558000000        
976725540000000 
    500000: time is 0.124684 s, speed is 802027.525585 records/s  
[email protected]  [email protected]      
from_to <31706076.1075858632278.JavaMail.evans@thyme>   1538015558000000        
1003441879000000        
    600000: time is 1.260054 s, speed is 79361.678150 records/s  
[email protected]     
[email protected] from_to 
<14154714.1075858633174.JavaMail.evans@thyme>   1538015558000000        
1003768608000000        
    700000: time is 0.120333 s, speed is 831027.232762 records/s  
from_email11347ryan.o'[email protected]        [email protected] 
from_to <2047280.1075858635378.JavaMail.evans@thyme>    1538015558000000        
1003974318000000        
    800000: time is 0.424332 s, speed is 235664.526833 records/s  
from_email11540ryan.o'[email protected]        
[email protected]      from_to 
<2047280.1075858635378.JavaMail.evans@thyme>    1538015558000000        
1003974318000000        
    900000: time is 0.127125 s, speed is 786627.335300 records/s  
[email protected]    [email protected] 
from_to <14154714.1075858633174.JavaMail.evans@thyme>   1538015558000000        
1003768608000000        
    1000000: time is 0.135605 s, speed is 737435.935253 records/s  
[email protected]   [email protected]   
     from_to <14154714.1075858633174.JavaMail.evans@thyme>   1538015558000000   
     1003768608000000        
    1100000: time is 0.653121 s, speed is 153110.985560 records/s  
[email protected]       [email protected]   
     from_to <12338129.1075855667248.JavaMail.evans@thyme>   1538015558000000   
     972994320000000 
    
    readNextBatchRow
    log4j:WARN No appenders could be found for logger 
(org.apache.carbondata.core.util.CarbonProperties).
    log4j:WARN Please initialize the log4j system properly.
    log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for 
more info.
    build time is 10.434133262
    100000: time is 0.015234857 s,      speed is 6563894.88920047records/s,     
hasNext time is 1.872E-5s       [email protected]          
[email protected]         from_to         
<5164240.1075855667637.JavaMail.evans@thyme>            1538015558000000        
        971703720000000         
    200000: time is 1.381838498 s,      speed is 72367.35707156423records/s,    
hasNext time is 1.373877655s    [email protected]      
        [email protected]         from_to         
<14154714.1075858633174.JavaMail.evans@thyme>           1538015558000000        
        1003768608000000                
    300000: time is 0.071597049 s,      speed is 1396705.6100314974records/s,   
hasNext time is 0.068254875s    [email protected]            
[email protected]           from_to         
<5977904.1075858636257.JavaMail.evans@thyme>            1538015558000000        
        1004057196000000                
    400000: time is 0.071777167 s,      speed is 1393200.7096351406records/s,   
hasNext time is 0.069637177s    [email protected]           
[email protected]           from_to         
<23732985.1075855665438.JavaMail.evans@thyme>           1538015558000000        
        976725540000000         
    500000: time is 0.227270961 s,      speed is 440003.4195305752records/s,    
hasNext time is 0.225358746s    [email protected]            
[email protected]              from_to         
<31706076.1075858632278.JavaMail.evans@thyme>           1538015558000000        
        1003441879000000                
    600000: time is 0.069326305 s,      speed is 1442453.9141383634records/s,   
hasNext time is 0.06744768s     [email protected]      
        [email protected]         from_to         
<14154714.1075858633174.JavaMail.evans@thyme>           1538015558000000        
        1003768608000000                
    700000: time is 0.07079448 s,       speed is 1412539.508730059records/s,    
hasNext time is 0.068803357s    from_email11347ryan.o'[email protected]          
[email protected]         from_to         
<2047280.1075858635378.JavaMail.evans@thyme>            1538015558000000        
        1003974318000000                
    800000: time is 0.147471892 s,      speed is 678095.3213782597records/s,    
hasNext time is 0.145297739s    from_email11540ryan.o'[email protected]          
[email protected]              from_to         
<2047280.1075858635378.JavaMail.evans@thyme>            1538015558000000        
        1003974318000000                
    900000: time is 0.073139928 s,      speed is 1367242.2537796318records/s,   
hasNext time is 0.070579908s    [email protected]      
        [email protected]         from_to         
<14154714.1075858633174.JavaMail.evans@thyme>           1538015558000000        
        1003768608000000                
    1000000: time is 0.073197467 s,     speed is 1366167.493200277records/s,    
hasNext time is 0.071379687s    [email protected]      
        [email protected]                from_to         
<14154714.1075858633174.JavaMail.evans@thyme>           1538015558000000        
        1003768608000000                
    1100000: time is 0.141830179 s,     speed is 705068.5594918412records/s,    
hasNext time is 0.140102684s    [email protected]          
[email protected]                from_to         
<12338129.1075855667248.JavaMail.evans@thyme>           1538015558000000        
        972994320000000         
    
    Be sure to do all of the following checklist to help us incorporate 
    your contribution quickly and easily:
    
     - [ ] Any interfaces changed?
     
     - [ ] Any backward compatibility impacted?
     
     - [ ] Document update required?
    
     - [ ] Testing done
            Please provide details on 
            - Whether new unit test cases have been added or why no new tests 
are required?
            - How it is tested? Please attach test report.
            - Is it a performance related change? Please attach the performance 
test report.
            - Any additional information to help reviewers in testing this 
change.
           
     - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/xubo245/carbondata 
CARBONDATA-3003_supportBatchRow

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/carbondata/pull/2816.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2816
    
----
commit 425e76333ddb0991799f2fc7c0c028a18aca58b5
Author: xubo245 <xubo29@...>
Date:   2018-10-09T09:58:48Z

     [CARBONDATA-2981] Support read primitive data type in CSDK
    
                1.support readNextCarbonRow
                2.support read different primitive data type in c code from 
java side: int double short long string
                3.support some data type and convert: date timestamp varchar 
decimal array<T>
                4.remove readNextStringRow
    
    remove the file after finished run
    
    change the file

commit 4fc5ce599ada4875337c88f5eb8d217a8ae73ddd
Author: xubo245 <xubo29@...>
Date:   2018-10-11T02:12:20Z

    remove timestamp check

commit d74ce01c499b5b031d8123ea4dfc0cd90e56a2e8
Author: xubo245 <xubo29@...>
Date:   2018-10-16T03:02:07Z

    [CARBONDATA-300] Suppor read batch row in CSDK
    1. support read batch row in SDK
    2. support read batch row in CSDK
    3. add SDKReaderBenchmark IN SDK and testNextBatchRowPerformance in CSDK
    4. improve CSDK read performance

----


---

Reply via email to