+1 for just making the URI consistent and not supporting too many nice human edge cases :)
> On Sep 15, 2015, at 06:57, Marko Obrovac <[email protected]> wrote: > > Hello, > > Gabriel, Dan and I are discussing this very same topic on T103811~[1,2,3], so > please take a look there and weigh in! > > As for the specific endpoints, perhaps it'd be worth switching the places of > *top* and the project name to be more in line with the current public RESTful > URI layout? > > Also, I must admit I find the non-determinism of the endpoints confusing to > some extent. Specifically I'm referring to the `/{start}/{end}` portion (or, > in your notion, this should really be `/{start}{/end}` denoting that `{end}` > is an optional URI parameter), the problem being exactly that `{end}` is > optional and, if not supplied, the current date is assumed. That entails that > the result of making a request to the endpoint without an end date (or TS) > depends on the context (the context in this case being the time stamp of the > request). So, one day the request encompasses a span of 24h, while the next > that same request refers to a 48h period. > > I do agree that this makes it easier for humans to issue requests ("Why would > I need to write down today's date?"), but APIs are meant to be only > *human-friendly*, not *for humans* (yes, there is a difference :P). What I > mean is that it should feel natural for humans to create / programme calls to > the API and then use these results in their applications/presentations/etc. > In that context, there is literally no difference between: > > - give me the list of top articles for the past 30 days (this is how the > human asks the question) > - give me the list of top articles starting from 2015-08-15 (for an > application, that's just a matter of computing `current_time() - 1m`) > - give me the list of top articles starting form 2015-08-15 and ending on > 2015-09-15 (idem as above plus a call to `current_time()`) > > Unless, of course, you target mostly human requests, in which case my > argument is rendered moot :P > > My 2 cents, > Marko > > [1] https://phabricator.wikimedia.org/T103811 > <https://phabricator.wikimedia.org/T103811> > [2] https://phabricator.wikimedia.org/T103811#1639417 > <https://phabricator.wikimedia.org/T103811#1639417> > [3] https://phabricator.wikimedia.org/T103811#1640977 > <https://phabricator.wikimedia.org/T103811#1640977> > > > On 14 September 2015 at 16:53, Dan Andreescu <[email protected] > <mailto:[email protected]>> wrote: > Thank you all for your thoughtful opinions. > > Since people want to know the top pages over an arbitrary time period, we > think Druid would be the best back-end for that kind of query. But we're not > going to push that for the first release. It's very useful to know that's > the consensus, we can now start talking to Jaime Crespo about Druid / > alternatives, make plans, etc. Until then, the first release is going to > have the top endpoint that Joseph wrote about. This is easy to pre-aggregate > and dump into Cassandra. Also, the /v1/pageviews/ prefix is going to be on > all the endpoints we launch with, because these are endpoints in a > "pageviews" RESTBase module. So we'll have: > > /v1/pageviews/top/{project}/{access}/{year}/{month}/{day} > > for now, with {month} and {day} being optional parameters. This will give > you the top pageviews for the selected calendar date. And as soon as we can, > we'll have: > > /v1/pageviews/top/{project}/{access}/from/{start}{/end} > > As proposed by Gabriel, with {start} and {end} taking both full dates and > "now"-relative negative integers. > > The initial endpoint we launch won't have hourly resolution, that seems like > too much data to pre-aggregate. But we'll see how Druid handles very > specific dates (should be fine) and make that a feature in the second > version. We'll have to look into the privacy implications of short time > ranges, like an hour. > > > > On Mon, Sep 14, 2015 at 10:18 AM, Andrew Otto <[email protected] > <mailto:[email protected]>> wrote: >> Also, maybe top-articles instead of top, to avoid naming collision in the >> future? > > +1 for prefixing whatever paths you are doing now with something relevant. I > sense that there might be more than just pageview data in the future. > > /pageviews/top/…? > > > > >> On Sep 11, 2015, at 18:38, Marcel Ruiz Forns <[email protected] >> <mailto:[email protected]>> wrote: >> >> +1 Adam >> >> Also, maybe top-articles instead of top, to avoid naming collision in the >> future? >> >> On Sat, Sep 12, 2015 at 12:27 AM, Adam Baso <[email protected] >> <mailto:[email protected]>> wrote: >> I'd be in favor of both. Maybe with a little tweak to the pathing: >> >> /top/{project}/{access}/days/{days-in-the-past} >> >> /top/{project}/{access}/range/{start}/{end} >> >> with "days" or "range" maybe being earlier in the forward slash separated >> spec if it doesn't read well semantically. >> >> >> On Fri, Sep 11, 2015 at 3:14 PM, Dan Andreescu <[email protected] >> <mailto:[email protected]>> wrote: >> It wouldn't be too hard to offer both, but I'm thinking it might be >> confusing for a consumer. I think ultimately the decision should be up to >> the people using this data, because the use cases are fairly different for >> each form. If people ask for both, we'll do both. >> >> Leila, we'd love to have page_ids as well, but we'd have to block the >> release on a bigger effort to reliably mirror mediawiki databases in Hadoop >> for processing, so we'll probably punt on that for now. But we have more >> than many reasons to work on that sooner than later. >> >> On Fri, Sep 11, 2015 at 6:09 PM, Gabriel Wicke <[email protected] >> <mailto:[email protected]>> wrote: >> The former might be slightly easier to cache, and can be linked to / pulled >> in statically, without a need to dynamically construct a URL. Would it be >> hard to offer both? >> >> On Fri, Sep 11, 2015 at 3:06 PM, Leila Zia <[email protected] >> <mailto:[email protected]>> wrote: >> It's getting exciting. :-) >> >> I'd go with choice 2 since it gives more control to the user while offering >> what the user can get through choice 1 as well. >> >> Question: will we get page_ids or page_titles or both? It's good to have >> both. >> >> Leila >> >> On Fri, Sep 11, 2015 at 3:00 PM, Dan Andreescu <[email protected] >> <mailto:[email protected]>> wrote: >> Hi everyone. End of quarter is rapidly approaching and I wanted to ask a >> quick question about one of the endpoints we want to push out. We want to >> let you ask "what are the top articles" but we're not sure how to structure >> the URL so it's most useful to you. Here are the choices: >> >> Choice 1. /top/{project}/{access}/{days-in-the-past} >> >> Example: top articles via all en.wikipedia sites for the past 30 days: >> /top/en.wikipedia/all-access/30 >> >> >> Choice 2. /top/{project}/{access}/{start}/{end} >> >> Example: top articles via all en.wikipedia sites from June 12th, 2014 to >> August 30th, 2015: /top/en.wikipedia/all-access/2014-06-12/2015-08-30 >> >> >> (in all of those, >> >> * {project} means en.wikipedia, commons.wikimedia, etc. >> * {access} means access method as in desktop, mobile web, mobile app >> >> ) >> >> Which do you prefer? Would any other query style be useful? >> >> _______________________________________________ >> Analytics mailing list >> [email protected] <mailto:[email protected]> >> https://lists.wikimedia.org/mailman/listinfo/analytics >> <https://lists.wikimedia.org/mailman/listinfo/analytics> >> >> >> >> _______________________________________________ >> Analytics mailing list >> [email protected] <mailto:[email protected]> >> https://lists.wikimedia.org/mailman/listinfo/analytics >> <https://lists.wikimedia.org/mailman/listinfo/analytics> >> >> >> >> >> -- >> Gabriel Wicke >> Principal Engineer, Wikimedia Foundation >> >> _______________________________________________ >> Analytics mailing list >> [email protected] <mailto:[email protected]> >> https://lists.wikimedia.org/mailman/listinfo/analytics >> <https://lists.wikimedia.org/mailman/listinfo/analytics> >> >> >> >> _______________________________________________ >> Analytics mailing list >> [email protected] <mailto:[email protected]> >> https://lists.wikimedia.org/mailman/listinfo/analytics >> <https://lists.wikimedia.org/mailman/listinfo/analytics> >> >> >> >> _______________________________________________ >> Analytics mailing list >> [email protected] <mailto:[email protected]> >> https://lists.wikimedia.org/mailman/listinfo/analytics >> <https://lists.wikimedia.org/mailman/listinfo/analytics> >> >> >> >> >> -- >> Marcel Ruiz Forns >> Analytics Developer >> Wikimedia Foundation >> _______________________________________________ >> Analytics mailing list >> [email protected] <mailto:[email protected]> >> https://lists.wikimedia.org/mailman/listinfo/analytics >> <https://lists.wikimedia.org/mailman/listinfo/analytics> > > > _______________________________________________ > Analytics mailing list > [email protected] <mailto:[email protected]> > https://lists.wikimedia.org/mailman/listinfo/analytics > <https://lists.wikimedia.org/mailman/listinfo/analytics> > > > > _______________________________________________ > Analytics mailing list > [email protected] <mailto:[email protected]> > https://lists.wikimedia.org/mailman/listinfo/analytics > <https://lists.wikimedia.org/mailman/listinfo/analytics> > > > > > -- > Marko Obrovac, PhD > Senior Services Engineer > Wikimedia Foundation > _______________________________________________ > Analytics mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/analytics
_______________________________________________ Analytics mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/analytics
