[jira] [Commented] (BAHIR-130) Support Cloudant Lite Plan

Esteban Laver (JIRA) Thu, 24 Aug 2017 08:37:36 -0700

    [ 
https://issues.apache.org/jira/browse/BAHIR-130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16140186#comment-16140186
 ]


Esteban Laver commented on BAHIR-130:
-------------------------------------

[~romeokienzler] Just to know a bit more details, where are you running this 
connector? Locally? On Bluemix?

Also, when loading Cloudant data into Spark using the '_all_docs' (which is 
default) endpoint, you can lower the number of partitions used during load by 
changing the `jsonstore.rdd.partitions` value.  Changing this value will also 
lower the requests per second.  I believe the ideal number of partitions for a 
Cloudant Lite plan is 3.  For example: 

spark = SparkSession\
    .builder\
    .appName("Cloudant Spark SQL Example in Python using dataframes")\
    .config("cloudant.host","ACCOUNT.cloudant.com")\
    .config("cloudant.username", "USERNAME")\
    .config("cloudant.password","PASSWORD")\
    .config("jsonstore.rdd.partitions", 3)\
    .getOrCreate()

> Support Cloudant Lite Plan
> --------------------------
>
>                 Key: BAHIR-130
>                 URL: https://issues.apache.org/jira/browse/BAHIR-130
>             Project: Bahir
>          Issue Type: Improvement
>          Components: Spark SQL Data Sources
>    Affects Versions: Spark-2.0.0, Spark-2.0.1, Spark-2.0.2, Spark-2.1.0, 
> Spark-2.1.1, Spark-2.2.0
>         Environment: ApacheSpark, any
>            Reporter: Romeo Kienzer
>            Assignee: Romeo Kienzer
>            Priority: Minor
>             Fix For: Spark-2.1.1, Spark-2.2.0
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Cloudant has a plan called "Lite" supporting only five requests per second. 
> So you end up with the following exception:
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 4 in 
> stage 0.0 failed 10 times, most recent failure: Lost task 4.9 in stage 0.0 
> (TID 42, yp-spark-dal09-env5-0040): java.lang.RuntimeException: Database 
> harlemshake2 request error: {"error":"too_many_requests","reason":"You've 
> exceeded your current limit of 5 requests per second for query class. Please 
> try later.","class":"query","rate":5}
>       at 
> org.apache.bahir.cloudant.common.JsonStoreDataAccess.getQueryResult(JsonStoreDataAccess.scala:158)
>       at 
> org.apache.bahir.cloudant.common.JsonStoreDataAccess.getIterator(JsonStoreDataAccess.scala:72)
> Suggestion: Change JsonStoreDataAccess.scala in a way that when a 403 HTTP 
> status code is returned the response is parsed in order to obtain the rate 
> limit and then throttle the query down to that limit. In addition issue a 
> WARNING in the log



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (BAHIR-130) Support Cloudant Lite Plan

Reply via email to