Felipe, for some context on the work the team is doing on standardizing user 
class definitions and supportive analysis, check out: 
https://meta.wikimedia.org/wiki/Research:Newly_registered_user

On Feb 14, 2014, at 9:27 AM, Felipe Ortega <[email protected]> wrote:

> Hello all.
> 
> @Tim: By "feature" I mean having values for column user.user_registration 
> filled for DB replicas accessible from Tool-Labs, if possible. As Oliver has 
> suggested, I don't see any reason for this info not being available, as it is 
> already public from Special:ListUsers.
> 
> @Aaron: Thanks a lot. I belive that is a fairly decent approximation. In 
> fact, I suspect that daily or weekly aggregates would be enough for 
> time-series characterization. My actual goal is comparing trends between 
> different languages, and eventually correlation with other known activity 
> metrics.
> 
> Best regards,
> Felipe.
> 
> 
> 
> El Viernes 14 de febrero de 2014 16:00, Aaron Halfaker 
> <[email protected]> escribió:
> I have a dataset containing estimated registration dates for editors who 
> registered before Dec. 2005.  My method assumes that user_id is monotonically 
> increasing and sets the lowest upper-bound available.  
> 
> For example.  Let's assume the following rows:
> 
>     user_id    first_edit
>     12345      20040102030405  
>     12344      NULL
>     12343      20040102050102
> 
> Since an editor couldn't have saved a revision before registering their 
> account, we can assume that user 12345 registered there account on or before 
> 20040102030405.  If user_id is monotonically increasing, we also know that 
> user 12344 must have registered on or before 20040102030405, which lets us 
> fill in a NULL.  Similarly, we have a first_edit timestamp for user 12343, 
> but that edit happened pretty late.  We can actually just continue to 
> propagate the 20040102030405 timestamp to this user too.
> 
> After performing this approximation, we'd have the following rows:
> 
>     user_id    first_edit        user_registration_approx
>     12345      20040102030405    20040102030405
>     12344      NULL              20040102030405
>     12343      20040102050102    20040102030405
> 
> In effect, this is similar to the approximation discussed in 
> https://bugzilla.wikimedia.org/show_bug.cgi?id=18638, but I'm not trying to 
> interpolate probable registration timings on users.  In practice we're 
> talking about a difference of seconds, so I haven't bothered with the extra 
> work.  
> 
> I'm generating a datafile for English now that I should be able to share the 
> the end of the day:
> user_id
> registration_type  (see 
> https://meta.wikimedia.org/wiki/Research:Attached_user and 
> https://meta.wikimedia.org/wiki/Research:Newly_registered_user)
> user_registration (from user table)
> first_edit (lowest timestamp from "revision" and "archive" for user_id)
> registration_approx (my approximation based on the method described above)
> -Aaron
> 
> 
> On Fri, Feb 14, 2014 at 6:06 AM, Federico Leva (Nemo) <[email protected]> 
> wrote:
> Felipe Ortega, 14/02/2014 12:05:
> 
> Thanks a lot. Then, I look forward to the confirmation and
> implementation of this feature. In case it's better to open a new issue
> on bugzilla or any other action on my side (lend a hand with value
> reviewing/testing) just let me know.
> 
> You could help assess the correctness of and/or code the guesstimate method 
> proposed in https://bugzilla.wikimedia.org/show_bug.cgi?id=18638 , for the 
> script to fill further blanks.
> 
> 
> Nemo
> 
> _______________________________________________
> Labs-l mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/labs-l
> 
> 
> 
> _______________________________________________
> Analytics mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/analytics

_______________________________________________
Analytics mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/analytics

Reply via email to