+1 for just making the URI consistent and not supporting too many nice human 
edge cases :)


> On Sep 15, 2015, at 06:57, Marko Obrovac <[email protected]> wrote:
> 
> Hello,
> 
> Gabriel, Dan and I are discussing this very same topic on T103811~[1,2,3], so 
> please take a look there and weigh in!
> 
> As for the specific endpoints, perhaps it'd be worth switching the places of 
> *top* and the project name to be more in line with the current public RESTful 
> URI layout?
> 
> Also, I must admit I find the non-determinism of the endpoints confusing to 
> some extent. Specifically I'm referring to the `/{start}/{end}` portion (or, 
> in your notion, this should really be `/{start}{/end}` denoting that `{end}` 
> is an optional URI parameter), the problem being exactly that `{end}` is 
> optional and, if not supplied, the current date is assumed. That entails that 
> the result of making a request to the endpoint without an end date (or TS) 
> depends on the context (the context in this case being the time stamp of the 
> request). So, one day the request encompasses a span of 24h, while the next 
> that same request refers to a 48h period.
> 
> I do agree that this makes it easier for humans to issue requests ("Why would 
> I need to write down today's date?"), but APIs are meant to be only 
> *human-friendly*, not *for humans* (yes, there is a difference :P). What I 
> mean is that it should feel natural for humans to create / programme calls to 
> the API and then use these results in their applications/presentations/etc. 
> In that context, there is literally no difference between:
> 
> - give me the list of top articles for the past 30 days (this is how the 
> human asks the question)
> - give me the list of top articles starting from 2015-08-15 (for an 
> application, that's just a matter of computing `current_time() - 1m`)
> - give me the list of top articles starting form 2015-08-15 and ending on 
> 2015-09-15 (idem as above plus a call to `current_time()`)
> 
> Unless, of course, you target mostly human requests, in which case my 
> argument is rendered moot :P
> 
> My 2 cents,
> Marko
> 
> [1] https://phabricator.wikimedia.org/T103811 
> <https://phabricator.wikimedia.org/T103811>
> [2] https://phabricator.wikimedia.org/T103811#1639417 
> <https://phabricator.wikimedia.org/T103811#1639417>
> [3] https://phabricator.wikimedia.org/T103811#1640977 
> <https://phabricator.wikimedia.org/T103811#1640977>
> 
> 
> On 14 September 2015 at 16:53, Dan Andreescu <[email protected] 
> <mailto:[email protected]>> wrote:
> Thank you all for your thoughtful opinions.
> 
> Since people want to know the top pages over an arbitrary time period, we 
> think Druid would be the best back-end for that kind of query.  But we're not 
> going to push that for the first release.  It's very useful to know that's 
> the consensus, we can now start talking to Jaime Crespo about Druid / 
> alternatives, make plans, etc.  Until then, the first release is going to 
> have the top endpoint that Joseph wrote about.  This is easy to pre-aggregate 
> and dump into Cassandra.  Also, the /v1/pageviews/ prefix is going to be on 
> all the endpoints we launch with, because these are endpoints in a 
> "pageviews" RESTBase module.  So we'll have:
> 
> /v1/pageviews/top/{project}/{access}/{year}/{month}/{day}
> 
> for now, with {month} and {day} being optional parameters.  This will give 
> you the top pageviews for the selected calendar date.  And as soon as we can, 
> we'll have:
> 
> /v1/pageviews/top/{project}/{access}/from/{start}{/end}
> 
> As proposed by Gabriel, with {start} and {end} taking both full dates and 
> "now"-relative negative integers.
> 
> The initial endpoint we launch won't have hourly resolution, that seems like 
> too much data to pre-aggregate.  But we'll see how Druid handles very 
> specific dates (should be fine) and make that a feature in the second 
> version.  We'll have to look into the privacy implications of short time 
> ranges, like an hour.
> 
> 
> 
> On Mon, Sep 14, 2015 at 10:18 AM, Andrew Otto <[email protected] 
> <mailto:[email protected]>> wrote:
>> Also, maybe top-articles instead of top, to avoid naming collision in the 
>> future?
> 
> +1 for prefixing whatever paths you are doing now with something relevant.  I 
> sense that there might be more than just pageview data in the future.
> 
> /pageviews/top/…?
> 
> 
> 
> 
>> On Sep 11, 2015, at 18:38, Marcel Ruiz Forns <[email protected] 
>> <mailto:[email protected]>> wrote:
>> 
>> +1 Adam
>> 
>> Also, maybe top-articles instead of top, to avoid naming collision in the 
>> future?
>> 
>> On Sat, Sep 12, 2015 at 12:27 AM, Adam Baso <[email protected] 
>> <mailto:[email protected]>> wrote:
>> I'd be in favor of both. Maybe with a little tweak to the pathing:
>> 
>> /top/{project}/{access}/days/{days-in-the-past}
>> 
>>  /top/{project}/{access}/range/{start}/{end}
>> 
>> with "days" or "range" maybe being earlier in the forward slash separated 
>> spec if it doesn't read well semantically.
>> 
>> 
>> On Fri, Sep 11, 2015 at 3:14 PM, Dan Andreescu <[email protected] 
>> <mailto:[email protected]>> wrote:
>> It wouldn't be too hard to offer both, but I'm thinking it might be 
>> confusing for a consumer.  I think ultimately the decision should be up to 
>> the people using this data, because the use cases are fairly different for 
>> each form.  If people ask for both, we'll do both.
>> 
>> Leila, we'd love to have page_ids as well, but we'd have to block the 
>> release on a bigger effort to reliably mirror mediawiki databases in Hadoop 
>> for processing, so we'll probably punt on that for now.  But we have more 
>> than many reasons to work on that sooner than later.
>> 
>> On Fri, Sep 11, 2015 at 6:09 PM, Gabriel Wicke <[email protected] 
>> <mailto:[email protected]>> wrote:
>> The former might be slightly easier to cache, and can be linked to / pulled 
>> in statically, without a need to dynamically construct a URL. Would it be 
>> hard to offer both?
>> 
>> On Fri, Sep 11, 2015 at 3:06 PM, Leila Zia <[email protected] 
>> <mailto:[email protected]>> wrote:
>> It's getting exciting. :-)
>> 
>> I'd go with choice 2 since it gives more control to the user while offering 
>> what the user can get through choice 1 as well.
>> 
>> Question: will we get page_ids or page_titles or both? It's good to have 
>> both.
>> 
>> Leila
>> 
>> On Fri, Sep 11, 2015 at 3:00 PM, Dan Andreescu <[email protected] 
>> <mailto:[email protected]>> wrote:
>> Hi everyone.  End of quarter is rapidly approaching and I wanted to ask a 
>> quick question about one of the endpoints we want to push out.  We want to 
>> let you ask "what are the top articles" but we're not sure how to structure 
>> the URL so it's most useful to you.  Here are the choices:
>> 
>> Choice 1. /top/{project}/{access}/{days-in-the-past}
>> 
>> Example: top articles via all en.wikipedia sites for the past 30 days: 
>> /top/en.wikipedia/all-access/30
>> 
>> 
>> Choice 2. /top/{project}/{access}/{start}/{end}
>> 
>> Example: top articles via all en.wikipedia sites from June 12th, 2014 to 
>> August 30th, 2015: /top/en.wikipedia/all-access/2014-06-12/2015-08-30
>> 
>> 
>> (in all of those,
>> 
>> * {project} means en.wikipedia, commons.wikimedia, etc.
>> * {access} means access method as in desktop, mobile web, mobile app
>> 
>> )
>> 
>> Which do you prefer?  Would any other query style be useful?
>> 
>> _______________________________________________
>> Analytics mailing list
>> [email protected] <mailto:[email protected]>
>> https://lists.wikimedia.org/mailman/listinfo/analytics 
>> <https://lists.wikimedia.org/mailman/listinfo/analytics>
>> 
>> 
>> 
>> _______________________________________________
>> Analytics mailing list
>> [email protected] <mailto:[email protected]>
>> https://lists.wikimedia.org/mailman/listinfo/analytics 
>> <https://lists.wikimedia.org/mailman/listinfo/analytics>
>> 
>> 
>> 
>> 
>> -- 
>> Gabriel Wicke
>> Principal Engineer, Wikimedia Foundation
>> 
>> _______________________________________________
>> Analytics mailing list
>> [email protected] <mailto:[email protected]>
>> https://lists.wikimedia.org/mailman/listinfo/analytics 
>> <https://lists.wikimedia.org/mailman/listinfo/analytics>
>> 
>> 
>> 
>> _______________________________________________
>> Analytics mailing list
>> [email protected] <mailto:[email protected]>
>> https://lists.wikimedia.org/mailman/listinfo/analytics 
>> <https://lists.wikimedia.org/mailman/listinfo/analytics>
>> 
>> 
>> 
>> _______________________________________________
>> Analytics mailing list
>> [email protected] <mailto:[email protected]>
>> https://lists.wikimedia.org/mailman/listinfo/analytics 
>> <https://lists.wikimedia.org/mailman/listinfo/analytics>
>> 
>> 
>> 
>> 
>> -- 
>> Marcel Ruiz Forns
>> Analytics Developer
>> Wikimedia Foundation
>> _______________________________________________
>> Analytics mailing list
>> [email protected] <mailto:[email protected]>
>> https://lists.wikimedia.org/mailman/listinfo/analytics 
>> <https://lists.wikimedia.org/mailman/listinfo/analytics>
> 
> 
> _______________________________________________
> Analytics mailing list
> [email protected] <mailto:[email protected]>
> https://lists.wikimedia.org/mailman/listinfo/analytics 
> <https://lists.wikimedia.org/mailman/listinfo/analytics>
> 
> 
> 
> _______________________________________________
> Analytics mailing list
> [email protected] <mailto:[email protected]>
> https://lists.wikimedia.org/mailman/listinfo/analytics 
> <https://lists.wikimedia.org/mailman/listinfo/analytics>
> 
> 
> 
> 
> -- 
> Marko Obrovac, PhD
> Senior Services Engineer
> Wikimedia Foundation
> _______________________________________________
> Analytics mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/analytics

_______________________________________________
Analytics mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/analytics

Reply via email to