Hi all,
My thoughts and opinion around entry-point definition.
While we have as a long-term plan to provide 'on-the-fly per-query
computation', for now we pre-aggregate every dataset we want serve, and
store it in cassandra to be exposed by restbase.
It means we can't easily provide variable start/end aggregation easily.
We could either
- send every dataset in between the start and end date for a given time
granularity level (could be big !).
- use '/top/{project}/{access}/{year}/{month}/{day}' entrypoint for
instance, with possibility to skip the 'day' parameter to have full month.
*@Thomas*:
- As Andrew said, the data we have is pre-aggregated at hour level so far.
- The data is tagged in UTC timezone and we planned that requests would be
using that timezone dy default.
- As said in this message, we are thinking of ways to provide better access
to data (on the fly computation, lower time granularity and others), and
this involves both technical and privacy concern. It will be for future :)
Joseph
On Sun, Sep 13, 2015 at 5:39 PM, Andrew Gray <[email protected]>
wrote:
> On 13 September 2015 at 16:26, Thomas Steiner <[email protected]> wrote:
> > I mean that somehow I could express getting data in an exact given
> period of
> > time, say, exactly the day September 11, 2015 in the time zone CET (that
> day
> > started at 3pm relative to PDT or 11pm relative to UTC). Without time
> zone
> > support, I would get data “outside” of my desired local time zone. Hope
> this
> > makes sense and is clear.
>
> A cautious note on time zones...
>
> If you're holding everything in one hour bins, as we currently do with
> the aggregated data, then it's easy enough to switch from UTC to CET
> to EST and so forth.
>
> But not all time zones differ by one hour increments. Most noticeably,
> India is on UTC+5:30, and a handful of other places also differ by 30
> minutes from the standard (or in the case of Nepal, 45). I'm not sure
> you could display these without regenerating the underlying data,
> which would be a lot of added complexity.
>
> --
> - Andrew Gray
> [email protected]
>
> _______________________________________________
> Analytics mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
--
*Joseph Allemandou*
Data Engineer @ Wikimedia Foundation
IRC: joal
_______________________________________________
Analytics mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/analytics