On 15 September 2015 at 19:37, Dan Andreescu <[email protected]>
wrote:

> I worry a little bit about the performance without having a batch api, but
>> we can certainly try it out and see what happens. Basically we will be
>> requesting the page view information for every NS_MAIN article in every
>> wiki once a week.  A quick sum against our search  cluster suggests this is
>> ~96 million api requests.
>>
>
96m equals approx 160 req/s which is more than sustainable for RESTBase.


> Oh, sorry, I thought you meant you were just querying 100 or so titles!
> In the case of huge queries like these, you should just query the
> wmf.pageview_hourly table directly.  You can do so with plain SQL via Hive
> or maybe Impala if we end up setting that up.  But those queries should be
> really fast in that table.  We can help you write the query if you send us
> an attempt and a spec of exactly what you need.
>

My performance-oriented nature would also think about something like that,
but I think this is not a decision that is to be taken lightly. While
having an API doesn't come for free, its beauty lies in the abstraction.
Concretely, as a pageview client, I am aware of the "contract" between the
service and myself and as such, I trust it to fulfil its part of the job.
How it does it is completely irrelevant to me, thus giving me the
opportunity to focus on "my part of the job" (no need for me to worry about
the internals of the implementation).

That said, it is clear as day that making 100 requests versus making one
batch request takes more time. However, on the one hand, it sounds like
Erik's use case is not latency- (or time-) sensitive. On the other, given
the nature of the pageview API, the cost of computing the result (in case
it is not available right away) dwarfs any connection or other related
overheads.

Cheers,
Marko


-- 
Marko Obrovac, PhD
Senior Services Engineer
Wikimedia Foundation
_______________________________________________
Analytics mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/analytics

Reply via email to