I see it is quite complicated to work with this data. It is a pity considering that valuable insights could be driven by readers' behaviors. I will think about what can be useful for the study.
Thanks for the answers, Nuria and Marcel! :) Cheers, Marc El dj., 30 juny 2016 a les 14:16, Marcel Ruiz Forns (<[email protected]>) va escriure: > Marc, I also see what Nuria says. Also please consider that the majority > of Wikipedia sessions have only one pageview. So in the majority of > sessions it would not be possible to approximate the time spent on page > with boundaries with Joseph's alternative. > > On Thu, Jun 30, 2016 at 2:02 PM, Nuria Ruiz <[email protected]> wrote: > >> >Aye, as Joseph says, the time-on-page or time-leaving is not collected, >> except as an extension of session reconstruction work. If you want a >> >concrete time, you're not gonna get it. >> >> I was about to make the same point, the data set that will most closely >> answer your questions is the one Oliver mentioned, otherwise we do not keep >> any information related to time on site and page requests so there is no >> "approximation" possible that will work on overall data. Even if you >> calculate signatures with IP-hash +user agent to approximate users (a >> method with known issues) there is no way for you to distinguish someone >> reading a page for an hour and someone that came to wikipedia twice in the >> same hour and spent a minute each time. Hopefully my example makes things >> more clear. >> >> Thanks, >> >> Nuria >> >> On Wed, Jun 29, 2016 at 4:58 AM, Oliver Keyes <[email protected]> >> wrote: >> >>> Aye, as Joseph says, the time-on-page or time-leaving is not collected, >>> except as an extension of session reconstruction work. If you want a >>> concrete time, you're not gonna get it. >>> >>> While PC-based data is more reliable than mobile, that does not >>> necessarily mean "reliable". I'm sort of confused, I guess, as to why the >>> datasets I linked (unless I'm misremembering them?) don't help: you would >>> have to do the calculation yourself but they should contain all the data >>> necessary to make that calculation (unless you want to have the pageID or >>> title associated with the time-on-page, in which case...yeah, that's an >>> issue). >>> >>> On Wed, Jun 29, 2016 at 3:16 AM, Marc Miquel <[email protected]> >>> wrote: >>> >>>> Thanks for the answer, Oliver. But I am not sure it answers my >>>> questions. I'd like to study aspects like how much time is spent in >>>> certain pages, as a proxy of how content is approached/read/understood. I'd >>>> be happy with time of entering the page, time of leaving. This is not >>>> entirely centered on 'user activity', but I said that because I imagined >>>> data would be stored in a similar way to editor sessions, or in a database >>>> and I would need to do the time calculations. >>>> >>>> Cheers, >>>> >>>> Marc >>>> >>>> >>>> El dc., 29 juny, 2016 03:11, Oliver Keyes <[email protected]> va >>>> escriure: >>>> >>>>> If historic data is okay, there's already a dataset released ( >>>>> https://figshare.com/articles/Activity_Sessions_datasets/1291033) >>>>> that was designed specifically to answer questions around how to best >>>>> calculate session length with regards to Wikipedia ( >>>>> http://arxiv.org/abs/1411.2878) >>>>> >>>>> On Tue, Jun 28, 2016 at 3:42 PM, Marc Miquel <[email protected]> >>>>> wrote: >>>>> >>>>>> Hello! >>>>>> >>>>>> I was thinking about user sessions, yes, so this would mean to >>>>>> aggregate pageviews visited by a user during a short amount of time (I >>>>>> should check the cutoff, but it could be around an hour or less). >>>>>> >>>>>> I am particularly interested in understanding the order in which >>>>>> pages are seen (start, end), duration, etc. >>>>>> I wouldn't need data from a long period neither, but I think data >>>>>> from multiple languages would be helpful. >>>>>> >>>>>> I imagined reader data could be sensitive to privacy, but would an >>>>>> NDA with my university and some sort of data encoding help with this? As >>>>>> I >>>>>> said, it is for a scientific purpose. >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Marc >>>>>> >>>>>> El dt., 28 juny 2016 a les 21:09, Nuria Ruiz (<[email protected]>) >>>>>> va escriure: >>>>>> >>>>>>> >>>>>>> Hello! >>>>>>> >>>>>>> >I am considering to study reader engagement for different article >>>>>>> topics in different languages. Because of this, I would like to know if >>>>>>> there is >any plan to make available pageviews dumps detailing activity >>>>>>> log >>>>>>> at session level per user - in a similar way to editor sessions. >>>>>>> >>>>>>> Are you thinking of "all-pageviews-visited-by-a-certain-user"? If >>>>>>> so, no we do not have any projects to provide that data as due to >>>>>>> privacy >>>>>>> concerns we neither have nor keep that information. >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> Nuria >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Tue, Jun 28, 2016 at 6:55 PM, Leila Zia <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> + Analytics >>>>>>>> >>>>>>>> >>>>>>>> On Tue, Jun 28, 2016 at 6:36 AM, Marc Miquel <[email protected]> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Hello, >>>>>>>>> >>>>>>>>> I have a question for you regarding pageviews datadumps. >>>>>>>>> >>>>>>>>> I am considering to study reader engagement for different article >>>>>>>>> topics in different languages. Because of this, I would like to know >>>>>>>>> if >>>>>>>>> there is any plan to make available pageviews dumps detailing >>>>>>>>> activity log >>>>>>>>> at session level per user - in a similar way to editor sessions. >>>>>>>>> >>>>>>>>> Since this would be for a research project I might ask funding for >>>>>>>>> it, I would like to know if I could count on that, what is the nature >>>>>>>>> of >>>>>>>>> the available data, and what would be the procedure to obtain this >>>>>>>>> data and >>>>>>>>> if there would be any implication because of privacy concerns. >>>>>>>>> >>>>>>>>> Thank you very much! >>>>>>>>> >>>>>>>>> Best, >>>>>>>>> >>>>>>>>> Marc Miquel >>>>>>>>> ᐧ >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> Wiki-research-l mailing list >>>>>>>>> [email protected] >>>>>>>>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Analytics mailing list >>>>>>>> [email protected] >>>>>>>> https://lists.wikimedia.org/mailman/listinfo/analytics >>>>>>>> >>>>>>>> >>>>>>> _______________________________________________ >>>>>>> Analytics mailing list >>>>>>> [email protected] >>>>>>> https://lists.wikimedia.org/mailman/listinfo/analytics >>>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Wiki-research-l mailing list >>>>>> [email protected] >>>>>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l >>>>>> >>>>>> >>>>> _______________________________________________ >>>>> Wiki-research-l mailing list >>>>> [email protected] >>>>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l >>>>> >>>> >>>> _______________________________________________ >>>> Wiki-research-l mailing list >>>> [email protected] >>>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l >>>> >>>> >>> >>> _______________________________________________ >>> Analytics mailing list >>> [email protected] >>> https://lists.wikimedia.org/mailman/listinfo/analytics >>> >>> >> >> _______________________________________________ >> Analytics mailing list >> [email protected] >> https://lists.wikimedia.org/mailman/listinfo/analytics >> >> > > > -- > *Marcel Ruiz Forns* > Analytics Developer > Wikimedia Foundation > _______________________________________________ > Analytics mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/analytics >
_______________________________________________ Analytics mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/analytics
