[ https://issues.apache.org/jira/browse/BAHIR-110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16054539#comment-16054539 ]
ASF GitHub Bot commented on BAHIR-110: -------------------------------------- GitHub user emlaver opened a pull request: https://github.com/apache/bahir/pull/45 [BAHIR-110] Implement _changes API for non-streaming receiver See [JIRA-110](https://issues.apache.org/jira/browse/BAHIR-110) _What_ Add support for _changes API for non-streaming (data frames and SQL temp. views) receiver. _How_ - New CloudantConfig option `apiReceiver` for selecting _all_docs and _changes endpoint in Cloudant to Spark data frames and SQL temp tables - Default is `_all_docs` endpoint for non-streaming receiver - Base abstract config class that's extended by an all_docs class and _changes class - JsonStoreConfigManager includes new 'cloudant.apiReceiver' config option for selecting _all_docs and _changes endpoint in Cloudant to Spark data frames and SQL temp tables - Updated README with details for 'cloudant.apiReceiver' option _Testing_ - Added base class ClientSparkFunSuite for setting up, creating, and loading sample data from flat files to test databases. - CloudantAllDocsDFSuite to test Spark data frames using the _all_docs endpoint. - CloudantChangesDFSuite to test Spark data frames using the _changes endpoint. - CloudantOptionSuite to verify Cloudant config options. - CloudantSparkSQLSuite to test Spark SQL temp views. Note: 27,378 lines added for the JSON files used in the testing suite. You can merge this pull request into a Git repository by running: $ git pull https://github.com/emlaver/bahir 110-implement-changes-api-in-receiver Alternatively you can review and apply these changes as the patch at: https://github.com/apache/bahir/pull/45.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #45 ---- commit 065752978e7d826cd311df4c517044183db0c372 Author: Esteban Laver <emla...@us.ibm.com> Date: 2017-06-16T18:17:30Z Excluded scala flat files for testing from build commit 751a5c7b876eca822e3047a609165505b5eadc4c Author: Esteban Laver <emla...@us.ibm.com> Date: 2017-06-16T18:23:39Z Added MapReduce example, removed unused imports, and replaced SQL TEMP TABLE with TEMP VIEW commit 39be19029a5b02820749ea7a5e19f345a4d74de0 Author: Esteban Laver <emla...@us.ibm.com> Date: 2017-06-19T15:22:41Z New CloudantConfig option `apiReceiver` for selecting _all_docs and _changes endpoint in Cloudant to Spark data frames and SQL temp tables - Default is `_all_docs` endpoint for non-streaming receiver - Base abstract config class that's extended by an all_docs class and _changes class - CloudantException thrown when required Spark Cloudant config option is empty or invalid - Updated scala style commit b662611722d118800b1135ab69f02a979ebedb3c Author: Esteban Laver <emla...@us.ibm.com> Date: 2017-06-19T15:23:28Z JsonStoreConfigManager: new 'cloudant.apiReceiver' config option for selecting _all_docs and _changes endpoint in Cloudant to Spark data frames and SQL temp tables - Throw CloudantException when spark config value is invalid or empty JsonStoreDataAccess: Added selector for use with _changes API and to filter out design docs JsonStoreRDD: Partition set to 1 for _changes API Updated Scala style in common classes: - Fixed ordering of imports - Added type notation - Removed redundant parenthesis commit 4c8fc6bff81df034e789d2e782db11fca6e7cd84 Author: Esteban Laver <emla...@us.ibm.com> Date: 2017-06-19T15:25:28Z JSON files and logging properties for testing suite commit a798f4cd1ef63f10a2f261f3c4460ef018d8d95d Author: Esteban Laver <emla...@us.ibm.com> Date: 2017-06-19T15:28:02Z Testing suite: ClientSparkFunSuite for setting up, creating, and loading sample data from flat files to test databases. CloudantAllDocsDFSuite to test Spark data frames using the _all_docs endpoint. CloudantChangesDFSuite to test Spark data frames using the _changes endpoint. CloudantOptionSuite to verify Cloudant config options. CloudantSparkSQLSuite to test Spark SQL temp views. - Version 2.6.7 for jackson dependencies resolves "Incompatible Jackson version" during build - Cloudant set-up and database creation using cloudant-client library commit c6ecb836ef0eb714398360b21ab95f1c18b762e1 Author: Esteban Laver <emla...@us.ibm.com> Date: 2017-06-19T15:28:23Z Updated README - New option 'cloudant.apiReceiver' for selecting _all_docs or _changes endpoint - Fixed links to source code files ---- > Replace use of _all_docs API with _changes API in all receivers > --------------------------------------------------------------- > > Key: BAHIR-110 > URL: https://issues.apache.org/jira/browse/BAHIR-110 > Project: Bahir > Issue Type: Improvement > Reporter: Esteban Laver > Original Estimate: 216h > Remaining Estimate: 216h > > Today we use the _changes API for Spark streaming receiver and _all_docs API > for non-streaming receiver. _all_docs API supports parallel reads (using > offset and range) but performance of _changes API is still better in most > cases (even with single threaded support). > With this ticket we want to: > a) re-implement all receivers using _changes API > b) compare performance between the two implementations based on _changes and > _all_docs > Based on the results in b) we could decide to either > - replace _all_docs implementation with _changes based implementation OR > - allow customers to pick one (with a solid documentation about pros and > cons) -- This message was sent by Atlassian JIRA (v6.4.14#64029)