What’s fawiki’s edit rate? Processing a diff shouldn’t take more than 1-2
seconds especially if you optimize the logic. I’m just spitballing ideas at
this point, but the logic should be easy

On Thu, Dec 27, 2018 at 12:37 PM Huji Lee <[email protected]> wrote:

> We will never know who "owns" which book. We only know that they have used
> it as a source a number of times. It could very well be that they just can
> easily borrow that book from a library (as is my case, with a lot of books
> and journals I have used as sources on I Wikipedia).
>
> The profiling issue is beyond this discussion, and I will make sure to
> mention that on fawiki, but one can already "profile" users using their
> edits (it is quite easy for people to look at my edits on fawiki and
> realize that I read and write about Persian music based on my fawiki edits;
> knowing that I also use some of the books on this topic as my source
> wouldn't add much to the picture; of note, my real world life and identity
> is unrelated to Persian music or music in general, so profiles are not
> always as revealing anyway).
>
> @John: I had not heard of mwparserfromhell and it is really cool! But how
> exactly does it come into play? The issue is less of being able to parse
> wikicode (what we really need is pretty much a regex search for the Persian
> equivalent of {{cite book}} template, and a second regex pattern that looks
> for the "name" parameter inside matches for the first one). Frankly, I am
> less worried about the steps *after* we found a "ciite book" instance, and
> more about the steps leading to it (running many many diffs).
>
> Perhaps I am not fully understanding your thoughts, so please elaborate.
>
> Thank you both!
>
> On Thu, Dec 27, 2018 at 1:24 PM T Paris <[email protected]> wrote:
>
>> Could I ask that you guys make this an “opt in” feature.  Both because
>> it’ll speed up the bot and also because once you start identifying which
>> books people own, you start to develop a profile on people.
>>
>>
>>
>> v/r,
>>
>> TP
>>
>>
>>
>> Sent from Mail <https://go.microsoft.com/fwlink/?LinkId=550986> for
>> Windows 10
>>
>>
>>
>> *From: *Huji Lee <[email protected]>
>> *Sent: *Thursday, December 27, 2018 11:42 AM
>> *To: *Labs <[email protected]>
>> *Subject: *[Cloud] List of users who have access to certain references
>>
>>
>>
>> This is an idea that came up on fawiki, and there is some merit to it. I
>> just want to figure out the best approach to implement it and would love
>> your input.
>>
>>
>>
>> *TL;DR: *We want to sweep through the recent edits in articles, look at
>> each diff, see if it contains the addition of a "{{cite book}}" template,
>> and if so, set it aside for future processing by another code.
>>
>>
>>
>> I wonder if there are already scripts in pywikibot that would help
>> initiate this. If not, I wonder what is the best strategy to implement this
>> using MW API.
>>
>>
>>
>> Thanks,
>>
>> Huji
>>
>>
>>
>> ------------
>>
>>
>>
>> Long version:
>>
>>
>>
>> The idea is to identify users who probably have access to certain offline
>> sources, so that if another user needs something to be checked in that
>> source and they don't have access to it, they know who to ask. For
>> instance, if I have access to a physical copy of Encyclopedia Britannica
>> (let's say it is a book and is not available digitally), and you want me to
>> check if it has an entry for  Sir Isaac Newton, it would be great if
>> instead of or in addition to asking on the village pump (which I might not
>> follow), you would ask me directly.
>>
>>
>>
>> The assumption is that if the same user keeps adding the same "{{cite
>> book}}" template in many articles (e.g. if I add the {{cite book | title =
>> Encyclopedia Britannica | ... }} in several edits across several articles),
>> then that user most likely has access to that source. And if these edits
>> are relatively recent and the user is still active, then chances are the
>> user can still access that source if another user asks them to.
>>
>>
>>
>> So if we find all such edits, we probably can aggregate them into a table
>> that shows "Huji" added a {{cite book}} for a book titled "Encyclopedia
>> Britannica" 17 times, and so on and so forth. Sorting it by the frequency
>> column, we might have a good list of user-source pairs.
>>
>>
>>
>>
>> _______________________________________________
>> Wikimedia Cloud Services mailing list
>> [email protected] (formerly [email protected])
>> https://lists.wikimedia.org/mailman/listinfo/cloud
>
> _______________________________________________
> Wikimedia Cloud Services mailing list
> [email protected] (formerly [email protected])
> https://lists.wikimedia.org/mailman/listinfo/cloud
_______________________________________________
Wikimedia Cloud Services mailing list
[email protected] (formerly [email protected])
https://lists.wikimedia.org/mailman/listinfo/cloud

Reply via email to