Resend this mail to [email protected]
Hi wikimedia team,


 This is Lee-Wei from Yahoo. Thanks a lot to Erik for the traffic numbers and 
Max for the response time estimation. Currently, we will only use the 
opensearch API in our backend (with cache). Here are some detail traffic 
numbers.  - expected RPS to opensearch:  peak 900 RPS (we will adopt cache so 
if the cache layer works fine, the average RPS would be 15)  We plan to release 
this feature on Jan. 5th, 2017. Please do let us know if you have any 
question/concern about it.
 Thanks again for the team's kindly help and support.
 Cheers,
Lee-Wei
 

    On Tuesday, November 15, 2016 10:07 AM, Erik Bernhardson 
<[email protected]> wrote:
 

 (cc'ing the discovery mailing list, as that team owns both the implementation 
and operation of search.)
I can partially answer this as one of the people responsible for search, but I 
have to defer to others about API, bots, and such.
This would be a noticeable portion of our traffic, for reference:
action=opensearch (and generator variants): 1.5k RPS
action=query&list=search (and generator variants): 600 RPSall api: 8k RPS 
(might be a bit higher, this is averaged over an hour)
opensearch is relatively cheap, the p95 to our search servers is ~30ms, with 
p50 at 7ms. So 600 RPS of opensearch traffic wouldn't be too hard on our search 
cluster. Using action=query is going to be too heavy, the full text searches 
are computationally more expensive to serve.
Might I ask, which wiki(s) would you be querying against? opensearch traffic is 
spread across our search cluster, but individual wikis only hit portions of it. 
For example opensearch on en.wikipedia.org is served by ~40% of the cluster, 
but zh.wikipedia.org (chinese) is only served by ~13%. If you are going to send 
heavy traffic to zh I might need to adjust those numbers to spread the load to 
more servers (easy enough, just need to know).
Additionally, you mentioned descriptions and keywords. These would not be 
provided directly by the opensearch api so you might be thinking of using the 
generator version of it (action=query&generator=prefixsearch) to get the 
results augmented (ex: 
/w/api.php?action=query&format=json&prop=extracts&generator=prefixsearch&exlimit=5&exintro=1&explaintext=1&gpssearch=yah&gpslimit=5).
 I'm not personally sure how expensive that is, someone else would have to 
chime in. 
So, from a computational point of view and only with respect to the search 
portion of our cluster, this seems plausible as long as we coordinate so that 
we know the traffic is coming. Others will have to chime in about the wider 
picture.

Erik B.
On Mon, Nov 14, 2016 at 4:40 PM, Eric Kuo <[email protected]> wrote:

Hi,
This is Eric from Yahoo. My team develops mobile apps for Taiwan and Hong Kong 
users. We want to provide wiki description on keywords in our contents, and we 
consider using MediaWiki API:OpenSearch and/or API:Query to achieve this. Our 
estimated RPS is 900, and we will cache the query result on our side. We would 
like to know if there is any concern with respect to our RPS, and if so, what 
is the best practice. 
Any comments and suggestions are welcome. Thank you for your time. 
Best regards,Eric
______________________________ _________________
Mediawiki-api mailing list
[email protected]. org
https://lists.wikimedia.org/ mailman/listinfo/mediawiki-api





   

   
_______________________________________________
Mediawiki-api mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-api

Reply via email to