GitHub user NamanRastogi opened a pull request:

    https://github.com/apache/carbondata/pull/2850

    Added concurrent reading through SDK

    Added another API for _CarbonReader.split_ to enable concurrent reading of 
carbondata files through SDK.
    ```java
    List<CarbonReader> multipleReaders = CarbonReader.split(maxSplits)
    ```
    
    For detailed information on how to use this API for concurrent reading, 
please refer **ConcurrentSdkReaderTest.java**
    
    ## Performance Metrics:
    
    | | configured table block: 1 MB | configured table block size: 10 MB | 
configured table block: 100 MB |
    | --- | --- | --- | --- |
    | **# rows: 1e6**<br>**Store: 7.6 MB** | # files generated: 
11<br><br>Sequential Read: 274 ms<br>Parallel Read: 123 ms | # files generated: 
1<br><br>Sequential Read: 247 ms<br>Parallel Read: 248 ms | # files generated: 
1<br><br>Sequential Read: 252 ms<br>Parallel Read: 254 ms |
    | **# rows: 1e7**<br>**Store: 78 MB** | # files generated: 
104<br><br>Sequential Read: 2685 ms<br>Parallel Read: 1230 ms | # files 
generated: 9<br><br>Sequential Read: 2499 ms<br>Parallel Read: 1357 ms | # 
files generated: 1<br><br>Sequential Read: 2527 ms<br>Parallel Read: 2597 ms |
    | **# rows: 1e8**<br>**Store: 865 MB** | | # files generated: 
95<br><br>Sequential Read: 27069 ms<br>Parallel Read: 16082 ms | # files 
generated: 15<br><br>Sequential Read: 25841 ms<br>Parallel Read: 13256 ms |
    
    
    
     - [ ] Any interfaces changed?
     - [x] Any backward compatibility impacted: No
     - [ ] Document update required?
     - [x] Testing done
            - New unit test case have been added.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/NamanRastogi/carbondata sdk_reader

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/carbondata/pull/2850.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2850
    
----
commit 55383136232203ca9de97a9304033c20cf7085f8
Author: Naman Rastogi <naman.rastogi.52@...>
Date:   2018-10-18T12:54:23Z

    Added split for CarbonReader
    
    to enable multithreaded reading of carbondata files

commit 79871f291262a05a1970b765232bf2f43f75e5d5
Author: Naman Rastogi <naman.rastogi.52@...>
Date:   2018-10-22T14:07:06Z

    Added reader.close in CarbonSdkReaderTest

commit cd44ee7efbe09c46cd4f6b84c431261b18a13d3d
Author: Naman Rastogi <naman.rastogi.52@...>
Date:   2018-10-22T14:07:06Z

    Added reader.close in CarbonSdkReaderTest

commit 201d98ea157590c1d5f4decba89fcabae684c755
Author: Naman Rastogi <naman.rastogi.52@...>
Date:   2018-10-24T06:53:44Z

    Merge branch 'sdk_reader' of https://github.com/NamanRastogi/carbondata 
into sdk_reader

----


---

Reply via email to