Hello all.

@Tim: By "feature" I mean having values for column user.user_registration 
filled for DB replicas accessible from Tool-Labs, if possible. As Oliver has 
suggested, I don't see any reason for this info not being available, as it is 
already public from Special:ListUsers.

@Aaron: Thanks a lot. I belive that is a fairly decent approximation. In fact, 
I suspect that daily or weekly aggregates would be enough for time-series 
characterization. My actual goal is comparing trends between different 
languages, and eventually correlation with other known activity metrics.

Best regards,
Felipe.





El Viernes 14 de febrero de 2014 16:00, Aaron Halfaker 
<[email protected]> escribió:
 
I have a dataset containing estimated registration dates for editors who 
registered before Dec. 2005.  My method assumes that user_id is monotonically 
increasing and sets the lowest upper-bound available.  
>
>
>For example.  Let's assume the following rows:
>
>
>    user_id    first_edit
>    12345      20040102030405  
>    12344      NULL
>    12343      20040102050102
>
>
>Since an editor couldn't have saved a revision before registering their 
>account, we can assume that user 12345 registered there account on or before 
>20040102030405.  If user_id is monotonically increasing, we also know that 
>user 12344 must have registered on or before 20040102030405, which lets us 
>fill in a NULL.  Similarly, we have a first_edit timestamp for user 12343, but 
>that edit happened pretty late.  We can actually just continue to propagate 
>the 20040102030405timestamp to this user too.
>
>
>After performing this approximation, we'd have the following rows:
>
>
>    user_id    first_edit        user_registration_approx
>    12345      20040102030405    20040102030405
>    12344      NULL              20040102030405
>    12343      20040102050102    20040102030405
>
>
>In effect, this is similar to the approximation discussed in 
>https://bugzilla.wikimedia.org/show_bug.cgi?id=18638, but I'm not trying to 
>interpolate probable registration timings on users.  In practice we're talking 
>about a difference of seconds, so I haven't bothered with the extra work.  
>
>
>I'm generating a datafile for English now that I should be able to share the 
>the end of the day:
>       * user_id
>       * registration_type  (see 
> https://meta.wikimedia.org/wiki/Research:Attached_user and 
> https://meta.wikimedia.org/wiki/Research:Newly_registered_user)
>       * user_registration (from user table)
>       * first_edit (lowest timestamp from "revision" and "archive" for 
> user_id)
>       * registration_approx (my approximation based on the method described 
> above)
>-Aaron
>
>
>
>On Fri, Feb 14, 2014 at 6:06 AM, Federico Leva (Nemo) <[email protected]> 
>wrote:
>
>Felipe Ortega, 14/02/2014 12:05:
>>
>>
>>Thanks a lot. Then, I look forward to the confirmation and
>>>implementation of this feature. In case it's better to open a new issue
>>>on bugzilla or any other action on my side (lend a hand with value
>>>reviewing/testing) just let me know.
>>>
>>
You could help assess the correctness of and/or code the guesstimate method 
proposed in https://bugzilla.wikimedia.org/show_bug.cgi?id=18638 , for the 
script to fill further blanks.
>>
>>
>>Nemo
>>
>>_______________________________________________
>>Labs-l mailing list
>>[email protected]
>>https://lists.wikimedia.org/mailman/listinfo/labs-l
>>
>
>
>
_______________________________________________
Labs-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/labs-l

Reply via email to