GitHub user NamanRastogi opened a pull request:
https://github.com/apache/carbondata/pull/2850
Added concurrent reading through SDK
Added another API for _CarbonReader.split_ to enable concurrent reading of
carbondata files through SDK.
```java
List<CarbonReader> multipleReaders = CarbonReader.split(maxSplits)
```
For detailed information on how to use this API for concurrent reading,
please refer **ConcurrentSdkReaderTest.java**
## Performance Metrics:
| | configured table block: 1 MB | configured table block size: 10 MB |
configured table block: 100 MB |
| --- | --- | --- | --- |
| **# rows: 1e6**<br>**Store: 7.6 MB** | # files generated:
11<br><br>Sequential Read: 274 ms<br>Parallel Read: 123 ms | # files generated:
1<br><br>Sequential Read: 247 ms<br>Parallel Read: 248 ms | # files generated:
1<br><br>Sequential Read: 252 ms<br>Parallel Read: 254 ms |
| **# rows: 1e7**<br>**Store: 78 MB** | # files generated:
104<br><br>Sequential Read: 2685 ms<br>Parallel Read: 1230 ms | # files
generated: 9<br><br>Sequential Read: 2499 ms<br>Parallel Read: 1357 ms | #
files generated: 1<br><br>Sequential Read: 2527 ms<br>Parallel Read: 2597 ms |
| **# rows: 1e8**<br>**Store: 865 MB** | | # files generated:
95<br><br>Sequential Read: 27069 ms<br>Parallel Read: 16082 ms | # files
generated: 15<br><br>Sequential Read: 25841 ms<br>Parallel Read: 13256 ms |
- [ ] Any interfaces changed?
- [x] Any backward compatibility impacted: No
- [ ] Document update required?
- [x] Testing done
- New unit test case have been added.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/NamanRastogi/carbondata sdk_reader
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/carbondata/pull/2850.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #2850
----
commit 55383136232203ca9de97a9304033c20cf7085f8
Author: Naman Rastogi <naman.rastogi.52@...>
Date: 2018-10-18T12:54:23Z
Added split for CarbonReader
to enable multithreaded reading of carbondata files
commit 79871f291262a05a1970b765232bf2f43f75e5d5
Author: Naman Rastogi <naman.rastogi.52@...>
Date: 2018-10-22T14:07:06Z
Added reader.close in CarbonSdkReaderTest
commit cd44ee7efbe09c46cd4f6b84c431261b18a13d3d
Author: Naman Rastogi <naman.rastogi.52@...>
Date: 2018-10-22T14:07:06Z
Added reader.close in CarbonSdkReaderTest
commit 201d98ea157590c1d5f4decba89fcabae684c755
Author: Naman Rastogi <naman.rastogi.52@...>
Date: 2018-10-24T06:53:44Z
Merge branch 'sdk_reader' of https://github.com/NamanRastogi/carbondata
into sdk_reader
----
---