[ 
https://issues.apache.org/jira/browse/BAHIR-110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16098488#comment-16098488
 ] 

ASF GitHub Bot commented on BAHIR-110:
--------------------------------------

Github user emlaver commented on the issue:

    https://github.com/apache/bahir/pull/45
  
    @ckadner Do you have any additional comments? I'll rebase and squash the 
commits once @mayya-sharipova reviews the last commit.


> Implement _changes API for non-streaming receiver
> -------------------------------------------------
>
>                 Key: BAHIR-110
>                 URL: https://issues.apache.org/jira/browse/BAHIR-110
>             Project: Bahir
>          Issue Type: Improvement
>            Reporter: Esteban Laver
>   Original Estimate: 216h
>  Remaining Estimate: 216h
>
> Today we use the _changes API for Spark streaming receiver and _all_docs API 
> for non-streaming receiver. _all_docs API supports parallel reads (using 
> offset and range) but performance of _changes API is still better in most 
> cases (even with single threaded support).
> With this ticket we want to:
> a) implement _changes API for non-streaming receivers
> b) allow customers to pick either _all_docs (default) or _changes API 
> endpoint, with documentation about pros and cons
> _changes performance details:
> Successfully loaded Cloudant (using local cloudant-developer docker image) 
> docs into Spark (local standalone) with the following database sizes: 15GB 
> (time: 8 1/2 mins), 20GB (17 mins), 46GB (25 mins), and 75GB (48 1/2 mins).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to