What’s fawiki’s edit rate? Processing a diff shouldn’t take more than 1-2 seconds especially if you optimize the logic. I’m just spitballing ideas at this point, but the logic should be easy
On Thu, Dec 27, 2018 at 12:37 PM Huji Lee <[email protected]> wrote: > We will never know who "owns" which book. We only know that they have used > it as a source a number of times. It could very well be that they just can > easily borrow that book from a library (as is my case, with a lot of books > and journals I have used as sources on I Wikipedia). > > The profiling issue is beyond this discussion, and I will make sure to > mention that on fawiki, but one can already "profile" users using their > edits (it is quite easy for people to look at my edits on fawiki and > realize that I read and write about Persian music based on my fawiki edits; > knowing that I also use some of the books on this topic as my source > wouldn't add much to the picture; of note, my real world life and identity > is unrelated to Persian music or music in general, so profiles are not > always as revealing anyway). > > @John: I had not heard of mwparserfromhell and it is really cool! But how > exactly does it come into play? The issue is less of being able to parse > wikicode (what we really need is pretty much a regex search for the Persian > equivalent of {{cite book}} template, and a second regex pattern that looks > for the "name" parameter inside matches for the first one). Frankly, I am > less worried about the steps *after* we found a "ciite book" instance, and > more about the steps leading to it (running many many diffs). > > Perhaps I am not fully understanding your thoughts, so please elaborate. > > Thank you both! > > On Thu, Dec 27, 2018 at 1:24 PM T Paris <[email protected]> wrote: > >> Could I ask that you guys make this an “opt in” feature. Both because >> it’ll speed up the bot and also because once you start identifying which >> books people own, you start to develop a profile on people. >> >> >> >> v/r, >> >> TP >> >> >> >> Sent from Mail <https://go.microsoft.com/fwlink/?LinkId=550986> for >> Windows 10 >> >> >> >> *From: *Huji Lee <[email protected]> >> *Sent: *Thursday, December 27, 2018 11:42 AM >> *To: *Labs <[email protected]> >> *Subject: *[Cloud] List of users who have access to certain references >> >> >> >> This is an idea that came up on fawiki, and there is some merit to it. I >> just want to figure out the best approach to implement it and would love >> your input. >> >> >> >> *TL;DR: *We want to sweep through the recent edits in articles, look at >> each diff, see if it contains the addition of a "{{cite book}}" template, >> and if so, set it aside for future processing by another code. >> >> >> >> I wonder if there are already scripts in pywikibot that would help >> initiate this. If not, I wonder what is the best strategy to implement this >> using MW API. >> >> >> >> Thanks, >> >> Huji >> >> >> >> ------------ >> >> >> >> Long version: >> >> >> >> The idea is to identify users who probably have access to certain offline >> sources, so that if another user needs something to be checked in that >> source and they don't have access to it, they know who to ask. For >> instance, if I have access to a physical copy of Encyclopedia Britannica >> (let's say it is a book and is not available digitally), and you want me to >> check if it has an entry for Sir Isaac Newton, it would be great if >> instead of or in addition to asking on the village pump (which I might not >> follow), you would ask me directly. >> >> >> >> The assumption is that if the same user keeps adding the same "{{cite >> book}}" template in many articles (e.g. if I add the {{cite book | title = >> Encyclopedia Britannica | ... }} in several edits across several articles), >> then that user most likely has access to that source. And if these edits >> are relatively recent and the user is still active, then chances are the >> user can still access that source if another user asks them to. >> >> >> >> So if we find all such edits, we probably can aggregate them into a table >> that shows "Huji" added a {{cite book}} for a book titled "Encyclopedia >> Britannica" 17 times, and so on and so forth. Sorting it by the frequency >> column, we might have a good list of user-source pairs. >> >> >> >> >> _______________________________________________ >> Wikimedia Cloud Services mailing list >> [email protected] (formerly [email protected]) >> https://lists.wikimedia.org/mailman/listinfo/cloud > > _______________________________________________ > Wikimedia Cloud Services mailing list > [email protected] (formerly [email protected]) > https://lists.wikimedia.org/mailman/listinfo/cloud
_______________________________________________ Wikimedia Cloud Services mailing list [email protected] (formerly [email protected]) https://lists.wikimedia.org/mailman/listinfo/cloud
