What can numbers tell us about CZ after the launch?
Here are some selected categories, as of 14th April (first time I did the
'accountancy') and 8th May.
Mainspace text pages = 2071 -> 2616
Number of redirects = 1173 -> 1504
Disambig pages = 22 -> 30
Articles (=textpages-disambig) = 2049 -> 2586
----
CZ_Live = 1387 -> 1757
Checklisted_Articles = 1238 -> 1900
Internal_Articles = 876 -> 1398
External_Articles = 361 -> 503
Stub_Articles = 223 -> 305
Developing_Articles = 383 -> 624
Developed_Articles = 262 -> 457
Approved_Articles = 11 -> 17
Remarks
* In view of the number of Checklisted Articles, the Big Cleanup is a big
success. For example, the difference between Internal Articles and CZ-Live
category decreases (they have more or less the same scope). Still, there is
some Cleanup to do. And, since we are growing, there will always be :-)
Now, we have quite many Internal Articles, more Developed Articles than
stubs(!), not so many External ones. IMHO, ideally, the absolute number of
the Externals should be more or less constant or even decrease with time (it
still grows, while the proportion External/Checklisted slightly decreased).
* For a relatively long time there were 11 Approved articles. In last three
weeks this has significantly changed.
How about the human resources we have?
Here is a picture (to be shown correctly a fixed width font is needed)
year and|this month| users |new | authors| backward | longterm
month | users | > 20 : >100 |authors| daily | 2 months | 6 months
-----------------------------------------------------------------------------
2006-10 | 52 | 8 : 2 | 52 | 10 | 0 0.0% | 0
0.0%
2006-11 | 183 | 41 : 12 | 143 | 24 | 0 0.0% | 0
0.0%
2006-12 | 120 | 28 : 12 | 49 | 17 | 29 24.2% | 0
0.0%
2007-01 | 350 | 43 : 15 | 280 | 29 | 43 12.3% | 0
0.0%
2007-02 | 801 | 85 : 25 | 678 | 62 | 47 5.9% | 0
0.0%
2007-03 | 348 | 76 : 23 | 208 | 38 | 62 17.8% | 0
0.0%
2007-04 | 395 | 111 : 45 | 180 | 63 | 77 19.5% | 19
4.8%
description:
* this month users: number of users who have at least 1 edit in the month;
in the next column there are 'active' ('very active') users with more than
20 (100) edits in the month
*authors daily = average number of interacting authors (editing the same
day)
*new user is detected when he makes his first edit (so the number may differ
from that of new userpages)
*backward = how many of users editing in month n had been there in months
n-1 and n-2
*longterm = backward 6 months, without break
Remarks:
* February was particular as for number of users. This can be related to
slashdot reports and the automatic registration. Clearly, many of
automatically registered users were just watching CZ, as the proportion of
active users of February is about 10% while in April it's 28%.
* "authors daily" column seems to be interesting, as it measures somehow the
'human resources' and how 'vibrating' the community is. Note that this is
not the number of all users of this month divided by number of days; it's
the actual (average) number of editing users. For example, as of April, you
could meet about 63 authors each day. We can also observe that after the
launch you could meet as many CZ authors in one day as in February (or in
the self-registration period).
* Columns "backward" and "longterm" are not very meaningful for CZ at this
stage, but maybe at some point they will be. They are meant to measure the
'human resources rotation' and 'stability' of the wiki.
In terms of number of edits, after the launch CZ was significantly more
active (see e.g. double activity in the mainspace)
month | total edits | main* | act* |new pages
-----------------------------------------------------
2006-10 | 1218 | 948 | 23.4 | 131
2006-11 | 5576 | 2968 | 30.5 | 1183
2006-12 | 8291 | 5929 | 69.1 | 1726
2007-01 | 8819 | 4428 | 25.2 | 1545
2007-02 | 16560 | 5654 | 20.7 | 4276
2007-03 | 15526 | 6369 | 44.6 | 3159
2007-04 | 26914 | 13333 | 68.1 | 3846
* main = number of edits in the mainspace
* act = mean 'activity', i.e. edits per user
Remarks
* In April there was nearly 900 edits per day, on average (all namespaces)
* new pages include redirects and pages from all namespaces. This explains
somewhat why there were so many "new pages" in February (think about new
userpages and talks only).
* a few timestamps in the database are prior to 2006-10 (hence adjusted to
2006-10).
Can we reasonably compare our 'human resources' and editing activity to the
English Wikipedia? Of course not. But maybe it is interesting to see that CZ
is comparable to some smaller, still big and active wikis.
To this end suppose that we count only the registered users. This seems
reasonable, since 'annonymous' IP, while numerous when compared to the
registered users, do not make many edits (globally 8-15%, depending on the
wiki). Consider also that rarely the same IP makes more than few edits. So
'annons' do not add that much to the community, just bring some usually
unreliable/unsourced information, if not vandalisms, to be verified by
regular users (e.g. adding those anons who are are
"more-than-20-edits-active" wouldn't really change the picture).
Then we compare CZ to lt.wikipedia (Lithuanian), considering that the latter
is listed on the English WP Main Page as one of more significant, i.e. in
the category "more than 25K entries" (about 44K, in fact).
As of April 2007, CZ had a bit more editing users (395 vs 338), active users
(111 vs. 93), more very active users (45 vs. 43, "bots" included), new users
(180 vs. 142), users daily (63 vs 53). Interestingly, users of ltwiki are
exceptionally active (systematically more than 100 edits in month per user,
far above the average for wikis I analyzed; a typical activity is 50-60
edits per user). In April there were 5378 new pages on ltwiki (vs 3846 on
CZ).
If we look at wikis from the category "more than 50K entries" (11 members),
then CZ is still of the same "order of magnitude" but generally much
smaller. For example, on huwiki (Hungarian, probably the smallest in the
category, 58K entries) there were 1078 users editing in April, 291 active
users, 143 very acitve, 444 new users, 173 users daily (on average) and
about 9800 new pages.
And here are the numbers for nowiki (Norwegian) from "more than 100K
entries" category. Unfortunately, it comes from March 2007, since there were
no dump file for April available. There were 2439 editing users, 343 active,
152 very active users, 1259 new users, 249 users daily, 12626 new pages.
That said, I do not think that making CZ-WP comparison in terms of quantity
is very relevant at least at the present stage. Clearly, CZ encourages
different working style and priorities that do not easily translate into
stats or make "counters" turn more slowly (e.g. accent on
quality/reliability, narrative/introductory style, not creating stubs
without clear intent to develop them etc.). I guess CZ will always be
different. Now, I made a quantitative comparison just to test the hypothesis
(or not-so-rare suggestion) that CZ is likely to ' re-create the failure of
Nupedia'. Clearly, the numbers tell us that CZ will (probably) succeed.
Alex.
PS. Any technical details (data sources, scripts, more detailed discussion
of methods and results etc.) on request.
_______________________________________________
Citizendium-l mailing list
[email protected]
https://lists.purdue.edu/mailman/listinfo/citizendium-l