If we were doing this internally, a possibility would be to instrument
MediaWiki and send sampled events with the time on page to EventLogging.
This would not be retroactive though, we would have to wait a couple months
to collect significant data. In any case, I'm not sure if this would be
possible with an NDA?

On Fri, Jul 1, 2016 at 11:52 AM, Marc Miquel <[email protected]> wrote:

> I see it is quite complicated to work with this data. It is a pity
> considering that valuable insights could be driven by readers' behaviors. I
> will think about what can be useful for the study.
>
> Thanks for the answers, Nuria and Marcel! :)
> Cheers,
>
> Marc
>
> El dj., 30 juny 2016 a les 14:16, Marcel Ruiz Forns (<[email protected]>)
> va escriure:
>
>> Marc, I also see what Nuria says. Also please consider that the majority
>> of Wikipedia sessions have only one pageview. So in the majority of
>> sessions it would not be possible to approximate the time spent on page
>> with boundaries with Joseph's alternative.
>>
>> On Thu, Jun 30, 2016 at 2:02 PM, Nuria Ruiz <[email protected]> wrote:
>>
>>> >Aye, as Joseph says, the time-on-page or time-leaving is not
>>> collected, except as an extension of session reconstruction work. If you
>>> want a >concrete time, you're not gonna get it.
>>>
>>> I was about to make the same point, the data set that will most closely
>>> answer your questions is the one Oliver mentioned, otherwise we do not keep
>>> any information related to time on site and page requests so there is no
>>> "approximation" possible that will work on overall data. Even if you
>>> calculate signatures with IP-hash +user agent to approximate users (a
>>> method with known issues) there is no way for you to distinguish someone
>>> reading a page for an hour and someone that came to wikipedia twice in the
>>> same hour and spent a minute each time. Hopefully my example makes things
>>> more clear.
>>>
>>> Thanks,
>>>
>>> Nuria
>>>
>>> On Wed, Jun 29, 2016 at 4:58 AM, Oliver Keyes <[email protected]>
>>> wrote:
>>>
>>>> Aye, as Joseph says, the time-on-page or time-leaving is not collected,
>>>> except as an extension of session reconstruction work. If you want a
>>>> concrete time, you're not gonna get it.
>>>>
>>>> While PC-based data is more reliable than mobile, that does not
>>>> necessarily mean "reliable". I'm sort of confused, I guess, as to why the
>>>> datasets I linked (unless I'm misremembering them?) don't help: you would
>>>> have to do the calculation yourself but they should contain all the data
>>>> necessary to make that calculation (unless you want to have the pageID or
>>>> title associated with the time-on-page, in which case...yeah, that's an
>>>> issue).
>>>>
>>>> On Wed, Jun 29, 2016 at 3:16 AM, Marc Miquel <[email protected]>
>>>> wrote:
>>>>
>>>>> Thanks for the answer, Oliver. But I am not sure it answers my
>>>>> questions. I'd like to study aspects like how much time is spent in
>>>>> certain pages, as a proxy of how content is approached/read/understood. 
>>>>> I'd
>>>>> be happy with time of entering the page, time of leaving. This is not
>>>>> entirely centered on 'user activity', but I said that because I imagined
>>>>> data would be stored in a similar way to editor sessions, or in a database
>>>>> and I would need to do the time calculations.
>>>>>
>>>>> Cheers,
>>>>>
>>>>> Marc
>>>>>
>>>>>
>>>>> El dc., 29 juny, 2016 03:11, Oliver Keyes <[email protected]> va
>>>>> escriure:
>>>>>
>>>>>> If historic data is okay, there's already a dataset released (
>>>>>> https://figshare.com/articles/Activity_Sessions_datasets/1291033)
>>>>>> that was designed specifically to answer questions around how to best
>>>>>> calculate session length with regards to Wikipedia (
>>>>>> http://arxiv.org/abs/1411.2878)
>>>>>>
>>>>>> On Tue, Jun 28, 2016 at 3:42 PM, Marc Miquel <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> Hello!
>>>>>>>
>>>>>>> I was thinking about user sessions, yes, so this would mean to
>>>>>>> aggregate pageviews visited by a user during a short amount of time (I
>>>>>>> should check the cutoff, but it could be around an hour or less).
>>>>>>>
>>>>>>> I am particularly interested in understanding the order in which
>>>>>>> pages are seen (start, end), duration, etc.
>>>>>>> I wouldn't need data from a long period neither, but I think data
>>>>>>> from multiple languages would be helpful.
>>>>>>>
>>>>>>> I imagined reader data could be sensitive to privacy, but would an
>>>>>>> NDA with my university and some sort of data encoding help with this? 
>>>>>>> As I
>>>>>>> said, it is for a scientific purpose.
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Marc
>>>>>>>
>>>>>>> El dt., 28 juny 2016 a les 21:09, Nuria Ruiz (<[email protected]>)
>>>>>>> va escriure:
>>>>>>>
>>>>>>>>
>>>>>>>> Hello!
>>>>>>>>
>>>>>>>> >I am considering to study reader engagement for different article
>>>>>>>> topics in different languages. Because of this, I would like to know if
>>>>>>>> there is >any plan to make available pageviews dumps detailing 
>>>>>>>> activity log
>>>>>>>> at session level per user - in a similar way to editor sessions.
>>>>>>>>
>>>>>>>> Are you thinking of "all-pageviews-visited-by-a-certain-user"? If
>>>>>>>> so, no we do not have any projects to provide that data as due to 
>>>>>>>> privacy
>>>>>>>> concerns we neither have nor keep that information.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>> Nuria
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Jun 28, 2016 at 6:55 PM, Leila Zia <[email protected]>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> + Analytics
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, Jun 28, 2016 at 6:36 AM, Marc Miquel <[email protected]
>>>>>>>>> > wrote:
>>>>>>>>>
>>>>>>>>>> Hello,
>>>>>>>>>>
>>>>>>>>>> I have a question for you regarding pageviews datadumps.
>>>>>>>>>>
>>>>>>>>>> I am considering to study reader engagement for different article
>>>>>>>>>> topics in different languages. Because of this, I would like to know 
>>>>>>>>>> if
>>>>>>>>>> there is any plan to make available pageviews dumps detailing 
>>>>>>>>>> activity log
>>>>>>>>>> at session level per user - in a similar way to editor sessions.
>>>>>>>>>>
>>>>>>>>>> Since this would be for a research project I might ask funding
>>>>>>>>>> for it, I would like to know if I could count on that, what is the 
>>>>>>>>>> nature
>>>>>>>>>> of the available data, and what would be the procedure to obtain 
>>>>>>>>>> this data
>>>>>>>>>> and if there would be any implication because of privacy concerns.
>>>>>>>>>>
>>>>>>>>>> Thank you very much!
>>>>>>>>>>
>>>>>>>>>> Best,
>>>>>>>>>>
>>>>>>>>>> Marc Miquel
>>>>>>>>>> ᐧ
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> Wiki-research-l mailing list
>>>>>>>>>> [email protected]
>>>>>>>>>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> Analytics mailing list
>>>>>>>>> [email protected]
>>>>>>>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>>>>>>>>
>>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Analytics mailing list
>>>>>>>> [email protected]
>>>>>>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Wiki-research-l mailing list
>>>>>>> [email protected]
>>>>>>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>>>>>>>
>>>>>>>
>>>>>> _______________________________________________
>>>>>> Wiki-research-l mailing list
>>>>>> [email protected]
>>>>>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Wiki-research-l mailing list
>>>>> [email protected]
>>>>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>>>>>
>>>>>
>>>>
>>>> _______________________________________________
>>>> Analytics mailing list
>>>> [email protected]
>>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>>>
>>>>
>>>
>>> _______________________________________________
>>> Analytics mailing list
>>> [email protected]
>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>>
>>>
>>
>>
>> --
>> *Marcel Ruiz Forns*
>> Analytics Developer
>> Wikimedia Foundation
>> _______________________________________________
>> Analytics mailing list
>> [email protected]
>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>
>
> _______________________________________________
> Analytics mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>


-- 
*Marcel Ruiz Forns*
Analytics Developer
Wikimedia Foundation
_______________________________________________
Analytics mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/analytics

Reply via email to