ASF GitHub Bot commented on BAHIR-110:

Github user emlaver commented on the issue:

    @mayya-sharipova Thank you for all the time you've spent reviewing this 
    The storageLevel option was not working as expected and also was not in the 
correct section in the README.  I’ve renamed the option to 
`cloudant.storageLevel` (as this option only works when _changes endpoint 
option is set), updated the README, and successfully tested and verified that 
the option is working using spark-submit and SparkSession’s config in 052abb4.  
I've run the Scala tests and examples and they all passed.  Could you please 
review and approve these changes?

> Implement _changes API for non-streaming receiver
> -------------------------------------------------
>                 Key: BAHIR-110
>                 URL: https://issues.apache.org/jira/browse/BAHIR-110
>             Project: Bahir
>          Issue Type: Improvement
>            Reporter: Esteban Laver
>   Original Estimate: 216h
>  Remaining Estimate: 216h
> Today we use the _changes API for Spark streaming receiver and _all_docs API 
> for non-streaming receiver. _all_docs API supports parallel reads (using 
> offset and range) but performance of _changes API is still better in most 
> cases (even with single threaded support).
> With this ticket we want to:
> a) implement _changes API for non-streaming receivers
> b) allow customers to pick either _all_docs (default) or _changes API 
> endpoint, with documentation about pros and cons
> _changes performance details:
> Successfully loaded Cloudant (using local cloudant-developer docker image) 
> docs into Spark (local standalone) with the following database sizes: 15GB 
> (time: 8 1/2 mins), 20GB (17 mins), 46GB (25 mins), and 75GB (48 1/2 mins).

This message was sent by Atlassian JIRA

Reply via email to