Kevin Scannell gave a very good presentation at the Celtic Knot Conference 2024 
on how the quality of the content of WP impacts the quality of language models 
it is used to train. The low-quality language of Wikiprojects created by people 
who claim or claimed to be native speakers of these languages will inevitably 
be used to train language models since often this is not only the easiest text 
to harvest, it could also be the only available text online. This then puts any 
actual language community at a major disadvantage, since they then have to fix 
texts á la Scots Wikipedia. This also places an undue burden on any existing 
language community that would not be there if people were honest about their 
language skills in the first place (for example, we have had multiple people 
who have claimed to be native speakers of both Nauruan and Kamassian at the 
same time*). If the language community does not exist or there are too few 
people to fix it, this low-quality material lives on, further distorting the 
language.

So I think we need to think about having some process in place when it has 
become apparent that the content of a Wikiproject is akin to what was in the 
Scots Wikipedia, but there is little to no language community to "save" the 
project, because in my opinion, what we can do now is not suitable anymore. For 
example, I do not think in this case that moving the text to the incubator is 
the right solution and as I have said on the closing proposal, the whole thing 
should just be nuked. If the language community exists and wants to later on 
start a Wikiproject of their own, I don't think they should be burdened with 
the low-to-no quality text produced by people who don't know the language. I 
know of one case where the language community has refused to touch a Wikipedia 
project in the incubator in their language because all the content was created 
by someone who had not the slightest clue about the language or how it works 
and they were not willing to fix his mess.

t. Kimberli
* They always pick small to non-existent, widely different language communities 
to pretend they belong to for some reason.

________________________________
From: Sotiale Wiki <[email protected]>
Sent: Monday, October 7, 2024 8:49 AM
To: Wikimedia Foundation Language Committee <[email protected]>
Subject: [Langcom] Re: Closing proposal Norfolk and Pitcairn Wikipedia

In conclusion, I agree to accept this proposal.

In principle, the fact that Wikipedia is inactive is meaningless as long as 
there is already valid content.
However, this project currently owns about 400 pages, and most of them are very 
short pages; this means that it cannot be highly evaluated as valid content.
Also, it is unlikely that there will be any activity in the near future, so it 
should be closed considering these points.

If contributors could show up in the near future, it might be evaluated 
differently,
but this project is not like that, and since it is already on its third 
proposal, it should be seriously considered for closure.

Sotiale

2024년 10월 7일 (월) 오전 1:20, MF-Warburg 
<[email protected]<mailto:[email protected]>>님이 작성:
https://meta.wikimedia.org/wiki/Proposals_for_closing_projects/Closure_of_Pitkern_%26_Norfuk_Wikipedia_3

I suggest to accept the proposal.
_______________________________________________
Langcom mailing list -- 
[email protected]<mailto:[email protected]>
To unsubscribe send an email to 
[email protected]<mailto:[email protected]>
_______________________________________________
Langcom mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to