Re: [Analytics] [Wiki-research-l] We need overview quality-minded metrics for different language versions of Wikipedia.

Pine W Sun, 06 Jul 2014 10:57:27 -0700

Forwarding to Analytics in case anyone there is interested. Please discuss
on the Research list.


Thanks,

Pine


On Sun, Jul 6, 2014 at 6:21 AM, Anders Wennersten <[email protected]>
wrote:

>  A standard on measurement quality levels on articles would be excellent
> and enable much better comparisons between language versions.
>
> I give some ideas of quality levels below, but I also want to stress that
> I believe  q also is related to coverage. En wp has most 100% q articles in
> many subject areas like films, and albums. But they have low coverage on
> poets whos work is not available in English, worse the dewp for example -
> how to evaluate something like that
>
> My intuitive quality levels on articles are
> -1 - Non acceptable quality
>   Machine translated articles, vandalinfested articles, severe POV
> content, shorter the 300 characters with no sources etc. No bot should be
> allowed to generate, such lousy articles. They ought all to be    deleted,
> and I would expect there to be no articles at all of this inferior quality
> on the bigger versions.
> 0 - Missing articles, that ought to exist
> 1 - Rudimentary articles
>    Articles but with proper sources, categories and infoboxes but short in
> substance.   Articles with proper substance but missing appropriate
> sources. Most proper botgenerated articles fall in this level
> 2 - OK articles
>    Have both proper substance and sources, but is not complete, do not
> cover all aspects of subject. Some  few botgenerated articles fall in this
> level
> 3 - Good articles
>   Cover the subject
>
> For each of these levels it should be possible to develop detailed
> criteria which would enable us to machineread  articles and classify them
> on their qlevel as of above
>
> Anders
>
> Han-Teng Liao (OII) skrev 2014-07-06 13:29:
>
> We need overview quality-minded metrics on different language versions of
> Wikipedias. Otherwise, the current "number games" played by bots across
> certain language versions have distorted the direction and focus of the
> editorial developments. I thereby propose an altmetric of
> "do-not-spread-oneself-too-thin" to counterbalance the situation.
>
>  (Sorry I was late in engaging the conversation of "[Wiki-research-l] Quality
> on different language version
> <http://www.mail-archive.com/[email protected]/msg03168.html>".
> It is a follow-up reply and a suggestion to this discussion thread.)
>
>  For example, in the Chinese Wikipedia community, there are current
> discussions talking about the current ranking of Chinese Wikipedia in terms
> of number of articles, and how the *neighboring* versions (those who have
> similar numbers of articles) use bots to generate new articles.
>
>  # The stats report generated and used by the Chinese community to
> compare itself against neighboring language versions:
>  #* Link
> <http://zh.wikipedia.org/wiki/Wikipedia:%E7%BB%9F%E8%AE%A1/%E4%B8%8E%E9%82%BB%E8%BF%91%E8%AF%AD%E8%A8%80%E7%89%88%E6%9C%AC%E6%AF%94%E8%BE%83>
>
>  #* Google translated
> <https://translate.google.com/translate?hl=en&sl=zh-CN&tl=en&u=http%3A%2F%2Fzh.wikipedia.org%2Fwiki%2FWikipedia%3A%25E7%25BB%259F%25E8%25AE%25A1%2F%25E4%25B8%258E%25E9%2582%25BB%25E8%25BF%2591%25E8%25AF%25AD%25E8%25A8%2580%25E7%2589%2588%25E6%259C%25AC%25E6%25AF%2594%25E8%25BE%2583>
>
>  # One current discussion:
>  #* Link
> <http://zh.wikipedia.org/wiki/Wikipedia:%E4%BA%92%E5%8A%A9%E5%AE%A2%E6%A0%88/%E6%B6%88%E6%81%AF#80.E4.B8.87.E6.9D.A1.E7.9B.AE.E6.89.80.E7.94.A8.E6.A0.87.E5.BF.97>
> #* Google translated
> <https://translate.google.com/translate?sl=auto&tl=en&js=y&prev=_t&hl=en&ie=UTF-8&u=http%3A%2F%2Fzh.wikipedia.org%2Fwiki%2FWikipedia%3A%25E4%25BA%2592%25E5%258A%25A9%25E5%25AE%25A2%25E6%25A0%2588%2F%25E6%25B6%2588%25E6%2581%25AF&edit-text=>
> # One recently archived discussion:
> #* Link
> <http://zh.wikipedia.org/wiki/Wikipedia:%E4%BA%92%E5%8A%A9%E5%AE%A2%E6%A0%88/%E6%B6%88%E6%81%AF/%E5%AD%98%E6%A1%A3/2014%E5%B9%B46%E6%9C%88#.E8.B6.8A.E5.8D.97.E8.AF.AD.E7.89.88.E6.9D.A1.E7.9B.AE.E6.95.B0.E8.B6.85.E8.BF.87.E6.97.A5.E8.AF.AD.E7.89.88>
> #* Google translated
> <https://translate.google.com/translate?hl=en&sl=zh-CN&tl=en&u=http%3A%2F%2Fzh.wikipedia.org%2Fwiki%2FWikipedia%3A%25E4%25BA%2592%25E5%258A%25A9%25E5%25AE%25A2%25E6%25A0%2588%2F%25E6%25B6%2588%25E6%2581%25AF%2F%25E5%25AD%2598%25E6%25A1%25A3%2F2014%25E5%25B9%25B46%25E6%259C%2588%23.E8.B6.8A.E5.8D.97.E8.AF.AD.E7.89.88.E6.9D.A1.E7.9B.AE.E6.95.B0.E8.B6.85.E8.BF.87.E6.97.A5.E8.AF.AD.E7.89.88>
>
>  To counterbalance the situation of such nonsensical comparison and
> competition, I personally think it is better to have an altmetric in place
> of the crude (and often distorting) measure of the number of articles.
>
>  One would expect a better encyclopedia to contain a set of core articles
> of human knowledge.
>
>  Indeed the meta has a list of 1000 articles that "every Wikipedia should
> have".
> http://meta.wikimedia.org/wiki/List_of_articles_every_Wikipedia_should_have
>
>  We can use this to generate a quantifiable metric of the development of
> the core articles in each language version, perhaps using the following
> numbers:
>
>  * number of references (total and per article)
> * number of footnotes (total and per article)
> * number of citations (total and per article)
> * number of distinct wiki internal links to other articles
> * number of good and feature articles (judged by each language version
> community)
>
>  Based on the above numbers, it is conceivable to come up with a metric
> that measure both the depth and breadth of the quality of the core
> articles. I admit that other measurements can and should be applied, but
> still the above numbers have the following merits:
>
>  * they reflect the nature of Wikipedia as dependent on other reliable
> secondary and primary information couces.
> * they can be applied across languages automatically without the need to
> analyze texts, which requires more tools and engenders issues of
> comparability.
>
>  For the sake of simplicity, let us say that one language version
> (possibly English or German) has the highest number of scores, then that
> language version can then be served as baseline for comparison. Say this
> benchmark language version has:
>
>  # the quality-metric number of QUAL (from the vital 1000)
> # the quantity number of total articles QUAN (from the existing metric)
>
>  Then the "do-not-spread-oneself-too-thin" quality metric can be
> calculated as:
>
>  QUAL/QUAN
>
>  (It can be further discussed whether logarithmic scales should be
> applied here.)
>
>  The gist of this "quality metric" is to reverse the obsession with the
> number of articles towards the important core articles, hoping to get more
> references, footnotes, citations, internal links and good/feature articles
> for the core 1000. It will hopefully indicate which language version is too
> "watery", or simply spreading oneself too thin with inconsequential short
> articles.
>
>  Let us have a discussion here [Wiki-research-l], before we extend the
> conversation to [Wikimedia-i].
>
>  Best,
> han-teng liao
>
>
>  --
> han-teng liao
>
>  "[O]nce the Imperial Institute of France and the Royal Society of London
> begin to work together on a new encyclopaedia, it will take less than a
> year to achieve a lasting peace between France and England." - Henri
> Saint-Simon (1810)
>
>  "A common ideology based on this Permanent World Encyclopaedia is a
> possible means, to some it seems the only means, of dissolving human
> conflict into unity." - H.G. Wells (1937)
>
>
> _______________________________________________
> Wiki-research-l mailing 
> [email protected]https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
>
>
> _______________________________________________
> Wiki-research-l mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
>

_______________________________________________
Analytics mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/analytics

Re: [Analytics] [Wiki-research-l] We need overview quality-minded metrics for different language versions of Wikipedia.

Reply via email to