Resend this mail to [email protected]
Hi wikimedia team,
This is Lee-Wei from Yahoo. Thanks a lot to Erik for the traffic numbers and
Max for the response time estimation. Currently, we will only use the
opensearch API in our backend (with cache). Here are some detail traffic
numbers. - expected RPS to opensearch: peak 900 RPS (we will adopt cache so
if the cache layer works fine, the average RPS would be 15) We plan to release
this feature on Jan. 5th, 2017. Please do let us know if you have any
question/concern about it.
Thanks again for the team's kindly help and support.
Cheers,
Lee-Wei
On Tuesday, November 15, 2016 10:07 AM, Erik Bernhardson
<[email protected]> wrote:
(cc'ing the discovery mailing list, as that team owns both the implementation
and operation of search.)
I can partially answer this as one of the people responsible for search, but I
have to defer to others about API, bots, and such.
This would be a noticeable portion of our traffic, for reference:
action=opensearch (and generator variants): 1.5k RPS
action=query&list=search (and generator variants): 600 RPSall api: 8k RPS
(might be a bit higher, this is averaged over an hour)
opensearch is relatively cheap, the p95 to our search servers is ~30ms, with
p50 at 7ms. So 600 RPS of opensearch traffic wouldn't be too hard on our search
cluster. Using action=query is going to be too heavy, the full text searches
are computationally more expensive to serve.
Might I ask, which wiki(s) would you be querying against? opensearch traffic is
spread across our search cluster, but individual wikis only hit portions of it.
For example opensearch on en.wikipedia.org is served by ~40% of the cluster,
but zh.wikipedia.org (chinese) is only served by ~13%. If you are going to send
heavy traffic to zh I might need to adjust those numbers to spread the load to
more servers (easy enough, just need to know).
Additionally, you mentioned descriptions and keywords. These would not be
provided directly by the opensearch api so you might be thinking of using the
generator version of it (action=query&generator=prefixsearch) to get the
results augmented (ex:
/w/api.php?action=query&format=json&prop=extracts&generator=prefixsearch&exlimit=5&exintro=1&explaintext=1&gpssearch=yah&gpslimit=5).
I'm not personally sure how expensive that is, someone else would have to
chime in.
So, from a computational point of view and only with respect to the search
portion of our cluster, this seems plausible as long as we coordinate so that
we know the traffic is coming. Others will have to chime in about the wider
picture.
Erik B.
On Mon, Nov 14, 2016 at 4:40 PM, Eric Kuo <[email protected]> wrote:
Hi,
This is Eric from Yahoo. My team develops mobile apps for Taiwan and Hong Kong
users. We want to provide wiki description on keywords in our contents, and we
consider using MediaWiki API:OpenSearch and/or API:Query to achieve this. Our
estimated RPS is 900, and we will cache the query result on our side. We would
like to know if there is any concern with respect to our RPS, and if so, what
is the best practice.
Any comments and suggestions are welcome. Thank you for your time.
Best regards,Eric
______________________________ _________________
Mediawiki-api mailing list
[email protected]. org
https://lists.wikimedia.org/ mailman/listinfo/mediawiki-api
_______________________________________________
Mediawiki-api mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-api