[ 
https://issues.apache.org/jira/browse/BAHIR-110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16054539#comment-16054539
 ] 

ASF GitHub Bot commented on BAHIR-110:
--------------------------------------

GitHub user emlaver opened a pull request:

    https://github.com/apache/bahir/pull/45

    [BAHIR-110] Implement _changes API for non-streaming receiver

    See [JIRA-110](https://issues.apache.org/jira/browse/BAHIR-110)
    
    _What_
    
    Add support for _changes API for non-streaming (data frames and SQL temp. 
views) receiver.
    
    _How_
    - New CloudantConfig option `apiReceiver` for selecting _all_docs and 
_changes endpoint in Cloudant to Spark data frames and SQL temp tables
    - Default is `_all_docs` endpoint for non-streaming receiver
    - Base abstract config class that's extended by an all_docs class and 
_changes class
    - JsonStoreConfigManager includes new 'cloudant.apiReceiver' config option 
for selecting _all_docs and _changes endpoint in Cloudant to Spark data frames 
and SQL temp tables
    - Updated README with details for 'cloudant.apiReceiver' option
    
    _Testing_
    
    - Added base class ClientSparkFunSuite for setting up, creating, and 
loading sample data from flat files to test databases.
    - CloudantAllDocsDFSuite to test Spark data frames using the _all_docs 
endpoint.
    - CloudantChangesDFSuite to test Spark data frames using the _changes 
endpoint.
    - CloudantOptionSuite to verify Cloudant config options.
    - CloudantSparkSQLSuite to test Spark SQL temp views.
    
    Note: 27,378 lines added for the JSON files used in the testing suite.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/emlaver/bahir 
110-implement-changes-api-in-receiver

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/bahir/pull/45.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #45
    
----
commit 065752978e7d826cd311df4c517044183db0c372
Author: Esteban Laver <emla...@us.ibm.com>
Date:   2017-06-16T18:17:30Z

    Excluded scala flat files for testing from build

commit 751a5c7b876eca822e3047a609165505b5eadc4c
Author: Esteban Laver <emla...@us.ibm.com>
Date:   2017-06-16T18:23:39Z

    Added MapReduce example, removed unused imports, and replaced SQL TEMP 
TABLE with TEMP VIEW

commit 39be19029a5b02820749ea7a5e19f345a4d74de0
Author: Esteban Laver <emla...@us.ibm.com>
Date:   2017-06-19T15:22:41Z

    New CloudantConfig option `apiReceiver` for selecting _all_docs and 
_changes endpoint in Cloudant to Spark data frames and SQL temp tables
    - Default is `_all_docs` endpoint for non-streaming receiver
    - Base abstract config class that's extended by an all_docs class and 
_changes class
    - CloudantException thrown when required Spark Cloudant config option is 
empty or invalid
    - Updated scala style

commit b662611722d118800b1135ab69f02a979ebedb3c
Author: Esteban Laver <emla...@us.ibm.com>
Date:   2017-06-19T15:23:28Z

    JsonStoreConfigManager: new 'cloudant.apiReceiver' config option for 
selecting _all_docs and _changes endpoint in Cloudant to Spark data frames and 
SQL temp tables
    - Throw CloudantException when spark config value is invalid or empty
    JsonStoreDataAccess: Added selector for use with _changes API and to filter 
out design docs
    JsonStoreRDD: Partition set to 1 for _changes API
    Updated Scala style in common classes:
    - Fixed ordering of imports
    - Added type notation
    - Removed redundant parenthesis

commit 4c8fc6bff81df034e789d2e782db11fca6e7cd84
Author: Esteban Laver <emla...@us.ibm.com>
Date:   2017-06-19T15:25:28Z

    JSON files and logging properties for testing suite

commit a798f4cd1ef63f10a2f261f3c4460ef018d8d95d
Author: Esteban Laver <emla...@us.ibm.com>
Date:   2017-06-19T15:28:02Z

    Testing suite:
    ClientSparkFunSuite for setting up, creating, and loading sample data from 
flat files to test databases.
    CloudantAllDocsDFSuite to test Spark data frames using the _all_docs 
endpoint.
    CloudantChangesDFSuite to test Spark data frames using the _changes 
endpoint.
    CloudantOptionSuite to verify Cloudant config options.
    CloudantSparkSQLSuite to test Spark SQL temp views.
    
    - Version 2.6.7 for jackson dependencies resolves "Incompatible Jackson 
version" during build
    - Cloudant set-up and database creation using cloudant-client library

commit c6ecb836ef0eb714398360b21ab95f1c18b762e1
Author: Esteban Laver <emla...@us.ibm.com>
Date:   2017-06-19T15:28:23Z

    Updated README
    - New option 'cloudant.apiReceiver' for selecting _all_docs or _changes 
endpoint
    - Fixed links to source code files

----


> Replace use of _all_docs API with _changes API in all receivers
> ---------------------------------------------------------------
>
>                 Key: BAHIR-110
>                 URL: https://issues.apache.org/jira/browse/BAHIR-110
>             Project: Bahir
>          Issue Type: Improvement
>            Reporter: Esteban Laver
>   Original Estimate: 216h
>  Remaining Estimate: 216h
>
> Today we use the _changes API for Spark streaming receiver and _all_docs API 
> for non-streaming receiver. _all_docs API supports parallel reads (using 
> offset and range) but performance of _changes API is still better in most 
> cases (even with single threaded support).
> With this ticket we want to:
> a) re-implement all receivers using _changes API
> b) compare performance between the two implementations based on _changes and 
> _all_docs
> Based on the results in b) we could decide to either
> - replace _all_docs implementation with _changes based implementation OR
> - allow customers to pick one (with a solid documentation about pros and 
> cons) 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to