My thought was that /analytics could grow into having edit stats, other
research datasets, and whatever we see fit to add there.  Over my years
here this website remains the main place people come to get bulk data from
us, so why keep trying to pull them into other places.  We can just bring
the data to them :)

I have been accused of over-generalizing but in this case /analytics is a
clear simple improvement over /other and we already have more than just
traffic or pageview data we can link and explain from there.

And if the concern is that the page will get too complicated once we start
adding all kinds of data, then I'd say that's a challenge we can deal with
when it happens, but looking forward I think forking into
/analytics/traffic and /analytics/edits would be a reasonable solution, and
compatible with this first step.

On Tue, Feb 16, 2016 at 12:18 PM, Erik Zachte <[email protected]> wrote:

> or maybe dumps.wikimedia.org/traffic?
>
>
>
> I hope someday we will (again) have edit stats similar to the views stats
> we now have (geo breakdown etc).
>
>
>
> Erik
>
>
>
> *From:* Analytics [mailto:[email protected]] *On
> Behalf Of *Aaron Halfaker
> *Sent:* Tuesday, February 16, 2016 18:11
> *To:* A mailing list for the Analytics Team at WMF and everybody who has
> an interest in Wikipedia and analytics.
> *Subject:* Re: [Analytics] [Pageviews] [Technical] Simplifying the
> available static dumps of pageview data
>
>
>
> dumps.wikimedia.org/analytics
>
>
>
> Does "analytics" mean anything in this context?  Why not aim for something
> like dumps.wikimedia.org/views?
>
> -Aaron
>
>
>
> On Thu, Feb 11, 2016 at 9:39 AM, Oliver Keyes <[email protected]>
> wrote:
>
> It's also the International Day of Women and Girls in Science!
>
> Sounds like a good summary.
>
>
> On 11 February 2016 at 07:31, Dan Andreescu <[email protected]>
> wrote:
> > I almost revived this thread on Mardi Gras, but I didn't want to be
> known as
> > The Holiday Crusher so I waited.  Today is relatively safe [1] :)
> >
> > Ok, there are three main points being made:
> >
> > 1. deprecating the old datasets
> > 2. liberating ourselves from the old format
> > 3. reorganizing the dumps page
> >
> > My thoughts on each:
> >
> > 1. I agree with Dario and Erik's points.  Let's keep the old files
> around,
> > but stop generating new files in May 2016.  To explain this, we'll make a
> > new section called "Deprecated" and put links to the pagecounts-*
> datasets
> > there.
> >
> > 2. I wasn't expecting to talk about format, but it makes sense because,
> for
> > example, Erik's dataset is just a pivoted format.  So, we could have a
> > section for the Pageview datasets, with links for each format we already
> > have: Domasz archive format, Erik Z compressed format.  We could then
> add a
> > new format that's easier to understand and could even include some of the
> > data we expose via the pageview API.  But from an organizational point of
> > view, treating "format" as a separate concept from "dataset" will be an
> > improvement.
> >
> > 3. I think it's time we had our own page instead of just being under
> > dumps.wikimedia.org/other.  Let's have dumps.wikimedia.org/analytics and
> > link to it from both the main dumps page and /other.  The separation will
> > make it easier to reference other places we have data static file dumps,
> > like datasets.wikimedia.org.  And it'll also make it easier to add
> links and
> > references to how this work is being done and where people can interact
> with
> > us or help us.
> >
> >
> > I hope I captured what everyone was saying.  If there aren't any
> objections,
> > I'll send a list of next steps needed to accomplish this, and get to
> work :)
> >
> >
> >
> > [1] Today is Be Electrific Day, Get Out Your Guitar Day, Grandmother
> > Achievement Day, National Don't Cry Over Spilled Milk Day, National
> > Inventors' Day, National Make a Friend Day, National Peppermint Patty
> Day,
> > National Shut-in Visitation Day, Pro Sports Wives Day, Promise Day,
> > Satisfied Staying Single Day, White Shirt Day
> >
> >
> > On Wed, Jan 6, 2016 at 7:13 PM, Dario Taraborelli
> > <[email protected]> wrote:
> >>
> >> Erik's proposal sounds very reasonable.
> >>
> >> There might be some confusion about what we mean by "keeping the old
> >> datasets for longitudinal analysis". No one is planning to remove the
> old
> >> static dumps, just stop generating them/maintaining them going forward.
> >>
> >> I also want to echo Nuria regarding the human cost of maintaining
> multiple
> >> definitions. I just finished preparing a response to a reporter who was
> >> asking about project-level mobile PV data and I was not immediately
> able to
> >> answer if a specific data source I wanted to cite was using the old or
> new
> >> definition (until I talked to Dan and we looked up together a gerrit
> patch).
> >>
> >> How do people feel about turning off the generation of old dumps by May
> >> 2016, i.e. one year after having the two series of data available in
> >> parallel?
> >>
> >>
> >>
> >> On Wed, Jan 6, 2016 at 10:17 AM, Nuria Ruiz <[email protected]>
> wrote:
> >>>
> >>> >As I just mentioned to Dan in a private email conversation, keeping
> >>> > datasets even with imperfect measurements is important. Particularly
> for
> >>> > longitudinal analysis.
> >>> Have in mind that maintaining these old dumps is not "free", it causes
> a
> >>> lot of confusion and maintenance costs to have several pageview
> definitions
> >>> around. We get a lot of questions about spiky-ness of old definition
> and we
> >>> need to maintain software that generates the old files thus, we think
> is
> >>> reasonable to ask our users to transition to the new definition and
> >>> eventually (in a period of months) turn off the old dumps.
> >>>
> >>> On Thu, Dec 24, 2015 at 6:12 AM, Maurice Vergeer <[email protected]>
> >>> wrote:
> >>>>
> >>>> Dear all,
> >>>>
> >>>> As I just mentioned to Dan in a private email conversation, keeping
> >>>> datasets even with imperfect measurements is important. Particularly
> for
> >>>> longitudinal analysis.
> >>>>
> >>>> Also, from what I understand - me being a newby here - is that the
> data
> >>>> are stored in separate files. Dan suggested reordering the page into
> >>>> categories. Maybe, another option is to create more extensive
> datasets with
> >>>> more different measurements in a single datafile. On the other hand,
> the
> >>>> files would become even bigger in size. Not an issue for mee, but for
> users
> >>>> in the field accesibility (dowlnload bandwidth) could become an issue.
> >>>>
> >>>> my two cents
> >>>> Maurice
> >>>>
> >>>>
> >>>> On Thu, Dec 24, 2015 at 2:58 PM, Alex Druk <[email protected]>
> wrote:
> >>>>>
> >>>>> Nothing against this approach!
> >>>>>
> >>>>> On Thu, Dec 24, 2015 at 2:55 PM, Dan Andreescu
> >>>>> <[email protected]> wrote:
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> On Thu, Dec 24, 2015 at 8:48 AM, Alex Druk <[email protected]>
> >>>>>> wrote:
> >>>>>>>
> >>>>>>> Hi Dan,
> >>>>>>> Happy holidays!
> >>>>>>> Good idea to combine these datasets! However we have one more
> dataset
> >>>>>>> by Erik Zachte : http://dumps.wikimedia.org/other/pagecounts-ez/
> >>>>>>
> >>>>>>
> >>>>>> And that's an important one!  But I was thinking we could
> re-organize
> >>>>>> the page into categories.  Erik's dataset could go into a
> "processed data"
> >>>>>> category or something like that.  The three I wanted to talk about
> on this
> >>>>>> thread are just the raw data.
> >>>>>>
> >>>>>> _______________________________________________
> >>>>>> Analytics mailing list
> >>>>>> [email protected]
> >>>>>> https://lists.wikimedia.org/mailman/listinfo/analytics
> >>>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> --
> >>>>> Thank you.
> >>>>>
> >>>>> Alex Druk
> >>>>> [email protected]
> >>>>> (775) 237-8550 Google voice
> >>>>>
> >>>>> _______________________________________________
> >>>>> Analytics mailing list
> >>>>> [email protected]
> >>>>> https://lists.wikimedia.org/mailman/listinfo/analytics
> >>>>>
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> ________________________________________________
> >>>> Maurice Vergeer
> >>>> To contact me, see http://mauricevergeer.nl/node/5
> >>>> To see my publications, see http://mauricevergeer.nl/node/1
> >>>> ________________________________________________
> >>>>
> >>>> _______________________________________________
> >>>> Analytics mailing list
> >>>> [email protected]
> >>>> https://lists.wikimedia.org/mailman/listinfo/analytics
> >>>>
> >>>
> >>>
> >>> _______________________________________________
> >>> Analytics mailing list
> >>> [email protected]
> >>> https://lists.wikimedia.org/mailman/listinfo/analytics
> >>>
> >>
> >>
> >>
> >> --
> >>
> >>
> >> Dario Taraborelli  Head of Research, Wikimedia Foundation
> >> wikimediafoundation.org • nitens.org • @readermeter
> >>
> >>
> >> _______________________________________________
> >> Analytics mailing list
> >> [email protected]
> >> https://lists.wikimedia.org/mailman/listinfo/analytics
> >>
> >
> >
> > _______________________________________________
> > Analytics mailing list
> > [email protected]
> > https://lists.wikimedia.org/mailman/listinfo/analytics
> >
>
>
>
> --
>
> Oliver Keyes
> Count Logula
> Wikimedia Foundation
>
> _______________________________________________
> Analytics mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
>
> _______________________________________________
> Analytics mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
_______________________________________________
Analytics mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/analytics

Reply via email to