Re: [Wikitech-l] Seemingly proprietary Javascript
On 06/03/13 16:28, Jay Ashworth wrote: To “convey” a work means any kind of propagation that enables other parties to make or receive copies. Mere interaction with a user through a computer network, with no transfer of a copy, is not conveying. As javascript is executed in the client, it probably is. Perhaps. But HTML is also executed in the client, and some legal decisions have gone each way on whether the mere viewing of a page constitutes copying in violation of copyright (the trend is towards no, thankfully. :-) Cheers, -- jra Interesting. Although HTML is presentational, while js is executable. I wouldn't consider most of our javascript as significant -even though we have plenty of usages considered non-trivial by [1]- since it is highly based on MediaWiki classes and ids. However, we also have some big javascript programs (WikiEditor, VisualEditor...) @Alexander: I would consider something like script src=//bits.wikimedia.org/www.mediawiki.org/load.php?debug=falseamp;lang=enamp;modules=jquery%2Cmediawiki%2CSpinner%7Cjquery.triggerQueueCallback%2CloadingSpinner%2CmwEmbedUtil%7Cmw.MwEmbedSupportamp;only=scriptsamp;skin=vectoramp;version=20130304T183632Z license=//bits.wikimedia.org/www.mediawiki.org/load.php?debug=falseamp;lang=enamp;modules=jquery%2Cmediawiki%2CSpinner%7Cjquery.triggerQueueCallback%2CloadingSpinner%2CmwEmbedUtil%7Cmw.MwEmbedSupportamp;only=scriptsamp;skin=vectoramp;version=20130304T183632Zmode=license/script with license attribute pointing to a JavaScript License Web Labels page for that script (yes, that would have to go up to whatwg). Another, easier, option would be that LibreJS detected the debug=false in the url and changed it to debug=true, expecting to find the license information there. It's also a natural change for people intending to reuse such javascript, even if they were unaware of such convention. @Chad: We use free licenses since we care about the freedom of our cde to be reused, but if the license is not appropiate to what we really intend, or even worse, is placing such a burden that even us aren't properly presenting them, it's something very discussion worthy. Up to the point where we could end up relicensing the code to better reflect our intention, as it was done from GFDL to CC-BY-SA with wikipedia content. 1- http://www.gnu.org/philosophy/javascript-trap.html ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] How do MS SQL users install MediaWiki?
Mark A. Hershberger m...@everybody.org wrote: On 03/04/2013 01:34 AM, Chad wrote: However, we do have people who want/use MSSQL, so I think taking the effort to keep it working is worthwhile--if someone's willing to commit. Since Danny Bauch has been using MSSQL and modifying MW for his needs, I'll work with him to get the necessary changes committed. Danny, if you could commit your changes into Gerrit, I'd be happy to test them. I'll be happy to come back to my PostgreSQL work and I'd happy to talk to other RDBMs people to coordinate some stuff (like getting unit tests to work or getting some abstractions right - transactions, schema management etc.). //Saper ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[Wikitech-l] IRC office hour on Tue March 19th, 1700 UTC, about Bug management
Hi everybody, on Tuesday 19th 17:00 UTC[1], there will be an IRC Office Hour in #wikimedia-office about Wikimedia's issue tracker[2] and Bug management[3]. Add it to your calendar and come to ask how to better find information in Bugzilla that interests you, and to share ideas and criticism how to make Bugzilla better. andre [1] https://meta.wikimedia.org/wiki/IRC_office_hours [2] https://bugzilla.wikimedia.org [3] https://www.mediawiki.org/wiki/Bug_management -- Andre Klapper | Wikimedia Bugwrangler http://blogs.gnome.org/aklapper/ ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[Wikitech-l] Indexing structures for Wikidata
As you probably know, the search in Wikidata sucks big time. Until we have created a proper Solr-based search and deployed on that infrastructure, we would like to implement and set up a reasonable stopgap solution. The simplest and most obvious signal for sorting the items would be to 1) make a prefix search 2) weight all results by the number of Wikipedias it links to This should usually provide the item you are looking for. Currently, the search order is random. Good luck with finding items like California, Wellington, or Berlin. Now, what I want to ask is, what would be the appropriate index structure for that table. The data is saved in the wb_terms table, which would need to be extended by a weight field. There is already a suggestion (based on discussions between Tim and Daniel K if I understood correctly) to change the wb_terms table index structure (see here https://bugzilla.wikimedia.org/show_bug.cgi?id=45529 ), but since we are changing the index structure anyway it would be great to get it right this time. Anyone who can jump in? (Looking especially at Asher and Tim) Any help would be appreciated. Cheers, Denny -- Project director Wikidata Wikimedia Deutschland e.V. | Obentrautstr. 72 | 10963 Berlin Tel. +49-30-219 158 26-0 | http://wikimedia.de Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e.V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Query performance - run code faster, merge code faster :-)
I found EXPLAIN (http://dev.mysql.com/doc/refman/5.0/en/using-explain.html) pretty useful during my project; rather than theories it shows the exact way the query is being resolved and if the indexes are being used rightly. On Thu, Mar 7, 2013 at 6:06 AM, Sumana Harihareswara suma...@wikimedia.orgwrote: If you want your code merged, you need to keep your database queries efficient. How can you tell if a query is inefficient? How do you write efficient queries, and avoid inefficient ones? We have some resources around: Roan Kattouw's https://www.mediawiki.org/wiki/Manual:Database_layout/MySQL_Optimization/Tutorial -- slides at https://commons.wikimedia.org/wiki/File:MediaWikiPerformanceProfiling.pdf Asher Feldman's https://www.mediawiki.org/wiki/File:MediaWiki_Performance_Profiling.ogv -- slides at https://www.mediawiki.org/wiki/File:SQL_indexing_Tutorial.pdf More hints: http://lists.wikimedia.org/pipermail/toolserver-l/2012-June/005075.html When you need to ask for a performance review, you can check out https://www.mediawiki.org/wiki/Developers/Maintainers#Other_Areas_of_Focus which suggests Tim Starling, Asher Feldman, and Ori Livneh. I also BOLDly suggest Nischay Nahata, who worked on Semantic MediaWiki's performance for his GSoC project in 2012. -- Sumana Harihareswara Engineering Community Manager Wikimedia Foundation -- Cheers, Nischay Nahata nischayn22.in ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Query performance - run code faster, merge code faster :-)
The advice on https://wikitech.wikimedia.org/wiki/Query_profiling_for_features_developers sounds good. Is there more detail somewhere on how to do this part Test your query against production slaves prior to full deployment? Luke On Wed, Mar 6, 2013 at 8:14 PM, Matthew Flaschen mflasc...@wikimedia.orgwrote: On 03/06/2013 04:36 PM, Sumana Harihareswara wrote: If you want your code merged, you need to keep your database queries efficient. How can you tell if a query is inefficient? How do you write efficient queries, and avoid inefficient ones? We have some resources around: Roan Kattouw's https://www.mediawiki.org/wiki/Manual:Database_layout/MySQL_Optimization/Tutorial -- slides at https://commons.wikimedia.org/wiki/File:MediaWikiPerformanceProfiling.pdf Asher Feldman's https://www.mediawiki.org/wiki/File:MediaWiki_Performance_Profiling.ogv -- slides at https://www.mediawiki.org/wiki/File:SQL_indexing_Tutorial.pdf And https://wikitech.wikimedia.org/wiki/Query_profiling_for_features_developers Matt Flaschen ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Replicating enwiki and dewiki for research purposes
Hey Quim, hey Maria, thank you for your replies! I actually knew where to find the XML-dumps but that pointer about the new XML-import tools is really helpful. So eventually, I was able to acquire a Xeon 8 core, 32GB RAM, 6TB SAS to start my experiments on :) Let's see what this baby can do * http://i.imgur.com/J47GJ.gif * Thanks again Andreas On Tue, Mar 5, 2013 at 3:33 PM, Maria Miteva mariya.mit...@gmail.comwrote: Hi, You might also try the following mailing list: * XML Data Dumps mailing listhttps://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l * Here is some info on importing XML dumps ( not sure what tools work well but probably the mailing list can help with that) http://meta.wikimedia.org/wiki/Data_dumps/Tools_for_importing Also, Ariel Glenn recently announced two new tools for importing dumps on the XML list: http://lists.wikimedia.org/pipermail/xmldatadumps-l/2013-February/000701.html Mariya On Tue, Mar 5, 2013 at 4:15 PM, Quim Gil q...@wikimedia.org wrote: On 03/05/2013 02:54 AM, Andreas Nüßlein wrote: Hi list, so I need to set up a local instance of the dewiki- and enwiki-DB with all revisions.. :-D Just in case: http://meta.wikimedia.org/**wiki/Mirroring_Wikimedia_**project_XML_dumps http://meta.wikimedia.org/wiki/Mirroring_Wikimedia_project_XML_dumps Also, you might want to ask / discuss at https://lists.wikimedia.org/**mailman/listinfo/offline-l https://lists.wikimedia.org/mailman/listinfo/offline-l Good luck with this interesting project! -- Quim Gil Technical Contributor Coordinator @ Wikimedia Foundation http://www.mediawiki.org/wiki/**User:Qgil http://www.mediawiki.org/wiki/User:Qgil __**_ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/**mailman/listinfo/wikitech-l https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[Wikitech-l] Editing wikipedia using google, openID or facebook
Hi, we discussed OAuth many times... but - what's the current status? Do we have working extensions which support using OpenID in order to login to mediawiki, or OAuth? So that you can login using your google account or such? I believe that WMF is working on this, so can we have some update? I know that english wikipedia community hates facebook and basically anything new :P but if not wikipedia at least many small wikis could use it. Thanks ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Editing wikipedia using google, openID or facebook
I just discovered this: http://www.mediawiki.org/wiki/Extension:OpenID why we don't have it on production? :) On Thu, Mar 7, 2013 at 8:30 PM, Petr Bena benap...@gmail.com wrote: Hi, we discussed OAuth many times... but - what's the current status? Do we have working extensions which support using OpenID in order to login to mediawiki, or OAuth? So that you can login using your google account or such? I believe that WMF is working on this, so can we have some update? I know that english wikipedia community hates facebook and basically anything new :P but if not wikipedia at least many small wikis could use it. Thanks ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Editing wikipedia using google, openID or facebook
On Thu, Mar 7, 2013 at 2:32 PM, Petr Bena benap...@gmail.com wrote: I just discovered this: http://www.mediawiki.org/wiki/Extension:OpenID why we don't have it on production? :) Just last week there was a thread about this. Extension:OpenID is under active development, but I think it could be ready for deployment in the near future (if not right now). *--* *Tyler Romeo* Stevens Institute of Technology, Class of 2015 Major in Computer Science www.whizkidztech.com | tylerro...@gmail.com ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Replicating enwiki and dewiki for research purposes
Andreas Nüßlein wrote: so I need to set up a local instance of the dewiki- and enwiki-DB with all revisions.. :-D I know it's rather a mammoth project so I was wondering if somebody could give me some pointers? First of all, I would need to know what kind of hardware I should get. Is it possible/smart to have it all in two ginormous MySQL-Instance (one for each of the languages) or will I need to do sharding? I don't need it to run smoothly. I only need to be able to query the database (and I know some of these queries can run for days) I will probably have access to some rather powerful machines here at the university and I have also quite a few workstation-machines on which I could theoretically do the sharding. Ryan L. or Marc P.: I routed Andreas to this list (from #wikimedia-toolserver), as I figured these questions related to the work that you all have been doing for Wikimedia Labs. Or at least I figured you all probably had some kind of formula for hardware provisioning that might be reusable here. Any pointers would be great. :-) MZMcBride ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[Wikitech-l] Indexing non-text content in LuceneSearch
Hi all! I would like to ask for you input on the question how non-wikitext content can be indexed by LuceneSearch. Background is the fact that full text search (Special:Search) is nearly useless on wikidata.org at the moment, see https://bugzilla.wikimedia.org/show_bug.cgi?id=42234. The reason for the problem appears to be that when rebuilding a Lucene index from scratch, using an XML dump of wikidata.org, the raw JSON structure used by Wikibase gets indexed. The indexer is blind, it just takes whatever text it finds in the dump. Indexing JSON does not work at all for fulltext search, especially not when non-ascii characters are represented as unicode escape sequences. Inside MediaWiki, in PHP, this work like this: * wikidata.org (or rather, the Wikibase extension) stores non-text content in wiki pages, using a ContentHandler that manages a JSON structure. * Wikibase's EntityContent class implements Content::getTextForSearchIndex() so it returns the labels and aliases of an entity. Data items thus get indexed by their labels and aliases. * getTextForSearchIndex() is used by the default MySQL search to build an index. It's also (ab)used by things that can only operate on flat text, like the AbuseFilter extension. * The LuceneSearch index gets updated live using the OAI extension, which in turn knows to use getTextForSearchIndex() to get the text for indexing. So, for anything indexed live, this works, but for rebuilding the search index from a dump, it doesn't - because the Java indexer knows nothing about content types, and has no interface for an extension to register additional content types. To improve this, I can think of a few options: 1) create a specialized XML dump that contains the text generated by getTextForSearchIndex() instead of actual page content. However, that only works if the dump is created using the PHP dumper. How are the regular dumps currently generated on WMF infrastructure? Also, would be be feasible to make an extra dump just for LuceneSearch (at least for wikidata.org)? 2) We could re-implement the ContentHandler facility in Java, and require extensions that define their own content types to provide a Java based handler in addition to the PHP one. That seems like a pretty massive undertaking of dubious value. But it would allow maximum control over what is indexed how. 3) The indexer code (without plugins) should not know about Wikibase, but it may have hard coded knowledge about JSON. It could have a special indexing mode for JSON, in which the structure is deserialized and traversed, and any values are added to the index (while the keys used in the structure would be ignored). We may still be indexing useless interna from the JSON, but at least there would be a lot fewer false negatives. I personally would prefer 1) if dumps are created with PHP, and 3) otherwise. 2) looks nice, but is hard to keep the Java and the PHP version from diverging. So, how would you fix this? thanks daniel ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Indexing non-text content in LuceneSearch
On Thu, Mar 7, 2013 at 11:45 AM, Daniel Kinzler dan...@brightbyte.de wrote: 1) create a specialized XML dump that contains the text generated by getTextForSearchIndex() instead of actual page content. That probably makes the most sense; alternately, make a dump that includes both raw data and text for search. This also allows for indexing extra stuff for files -- such as extracted text from a PDF of DjVu or metadata from a JPEG -- if the dump process etc can produce appropriate indexable data. However, that only works if the dump is created using the PHP dumper. How are the regular dumps currently generated on WMF infrastructure? Also, would be be feasible to make an extra dump just for LuceneSearch (at least for wikidata.org)? The dumps are indeed created via MediaWiki. I think Ariel or someone can comment with more detail on how it currently runs, it's been a while since I was in the thick of it. 2) We could re-implement the ContentHandler facility in Java, and require extensions that define their own content types to provide a Java based handler in addition to the PHP one. That seems like a pretty massive undertaking of dubious value. But it would allow maximum control over what is indexed how. No don't do it :) 3) The indexer code (without plugins) should not know about Wikibase, but it may have hard coded knowledge about JSON. It could have a special indexing mode for JSON, in which the structure is deserialized and traversed, and any values are added to the index (while the keys used in the structure would be ignored). We may still be indexing useless interna from the JSON, but at least there would be a lot fewer false negatives. Indexing structured data could be awesome -- again I think of file metadata as well as wikidata-style stuff. But I'm not sure how easy that'll be. Should probably be in addition to the text indexing, rather than replacing. -- brion ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Identifying pages that are slow to render
Le 06/03/13 23:58, Federico Leva (Nemo) a écrit : There's slow-parse.log, but it's private unless a solution is found for https://gerrit.wikimedia.org/r/#/c/49678/ https://wikitech.wikimedia.org/wiki/Logs And slow-parse.log is probably going to be kept private unless proven it is not harmful =) -- Antoine hashar Musso ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Bug 1542 - Log spam blacklist hits
Hey Chris I was exploring SpamBlaklist Extension. I have some doubts hope you could clear them. Is there any place I can get documentation of Class SpamBlacklist in the file SpamBlacklist_body.php. ? In function filter what does the following variables represent ? $title $text $section $editpage $out I have understood the following things from the code, please correct me if I am wrong. It extracts the edited text, and parse it to find the links. It then replaces the links which match the whitelist regex, and then checks if there are some links that match the blacklist regex. If the check is greater you return the content matched. it already enters in the debuglog if it finds a match I guess the bug aims at creating a sql table. I was thinking of the following fields to log. Title, Text, User, URLs, IP. I don't understand why you denied it. On Tue, Feb 26, 2013 at 1:25 AM, Chris Steipp cste...@wikimedia.org wrote: That's an ambitious first bug, Anubhav! Since this is an extension, it plugs into MediaWiki core using hooks. So periodically, the core code will run all of the functions registered for a particular hook, so the extensions can interact with the logic. In this case, SpamBlacklist has registered SpamBlacklistHooks::filterMerged to run whenever an editor attempts to save a page, or SpamBlacklistHooks::filterAPIEditBeforeSave if the edit came in through the api. So that is where you will want to log. Although MediaWiki has a logging feature, it sounds like you may want to add your own logging table (like the AbuseFilter extension). If you do that, make sure that you're only storing data that you really need, and is ok with our privacy policy (so no ip addresses!). Feel free to add me as a reviewer when you submit your code to gerrit. Chris On Mon, Feb 25, 2013 at 11:21 AM, Tyler Romeo tylerro...@gmail.com wrote: Hey, I don't know much about that, or how much you know, but at the very least I can tell you that the bug is in Extension:SpamBlacklist, which can be found at http://www.mediawiki.org/wiki/Extension:SpamBlacklist. From what I can see from the code, it seems to just use various Hooks in MediaWiki in order to stop editing, e-mailing, etc. if the request matches a parsed blacklist it has. *--* *Tyler Romeo* Stevens Institute of Technology, Class of 2015 Major in Computer Science www.whizkidztech.com | tylerro...@gmail.com On Mon, Feb 25, 2013 at 2:17 PM, anubhav agarwal anubhav...@gmail.com wrote: Hi Guys, I was trying to fix thishttps://bugzilla.wikimedia.org/show_bug.cgi?id=1542bug. I am a newbie to mediawiki and it's a first bug I'm trying to solve, so I don't know much. I want to know about the spam block list, how does it works, how does trigger the action, and its logging mechanism. It would be great if some one could help me fix this bug. Cheers, Anubhav Anubhav Agarwal| 4rth Year | Computer Science Engineering | IIT Roorkee ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l -- Cheers, Anubhav Anubhav Agarwal| 4rth Year | Computer Science Engineering | IIT Roorkee ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Identifying pages that are slow to render
Le 06/03/13 22:05, Robert Rohde a écrit : On enwiki we've already made Lua conversions with most of the string templates, several formatting templates (e.g. {{rnd}}, {{precision}}), {{coord}}, and a number of others. And there is work underway on a number of the more complex overhauls (e.g. {{cite}}, {{convert}}). However, it would be nice to identify problematic templates that may be less obvious. You can get in touch with Brad Jorsch and Tim Starling. They most probably have a list of templates that should quickly converted to LUA modules. If we got {{cite}} out, that will be already a nice improvement :-] -- Antoine hashar Musso ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Editing wikipedia using google, openID or facebook
Le 07/03/13 11:32, Petr Bena wrote: I just discovered this: http://www.mediawiki.org/wiki/Extension:OpenID why we don't have it on production? :) As far as I know, that extension is pending a full review before it lands on the Wikimedia cluster. Ryan Lane wrote about it: http://lists.wikimedia.org/pipermail/wikitech-l/2013-March/067124.html There is a draft document at: https://www.mediawiki.org/wiki/OpenID_Provider We still have to figure out which account will be used, the URL, whether we want a dedicated wiki etc... -- Antoine hashar Musso ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Identifying pages that are slow to render
On 03/07/2013 12:00 PM, Antoine Musso wrote: Le 06/03/13 23:58, Federico Leva (Nemo) a écrit : There's slow-parse.log, but it's private unless a solution is found for https://gerrit.wikimedia.org/r/#/c/49678/ https://wikitech.wikimedia.org/wiki/Logs And slow-parse.log is probably going to be kept private unless proven it is not harmful =) Why would it be harmful for public wikis? Anyone can do this on an article-by-article basis by copying the source their own MediaWiki instances. But it ends up being repeated work. Matt ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Editing wikipedia using google, openID or facebook
ah ok I was confused by it being flagged stable On Thu, Mar 7, 2013 at 8:35 PM, Tyler Romeo tylerro...@gmail.com wrote: On Thu, Mar 7, 2013 at 2:32 PM, Petr Bena benap...@gmail.com wrote: I just discovered this: http://www.mediawiki.org/wiki/Extension:OpenID why we don't have it on production? :) Just last week there was a thread about this. Extension:OpenID is under active development, but I think it could be ready for deployment in the near future (if not right now). *--* *Tyler Romeo* Stevens Institute of Technology, Class of 2015 Major in Computer Science www.whizkidztech.com | tylerro...@gmail.com ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Editing wikipedia using google, openID or facebook
On Thu, Mar 7, 2013 at 3:05 PM, Antoine Musso hashar+...@free.fr wrote: We still have to figure out which account will be used, the URL, whether we want a dedicated wiki etc... Those discussions are unrelated to using OpenID as a client, though. *--* *Tyler Romeo* Stevens Institute of Technology, Class of 2015 Major in Computer Science www.whizkidztech.com | tylerro...@gmail.com ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Identifying pages that are slow to render
On Thu, Mar 7, 2013 at 8:06 PM, Matthew Flaschen mflasc...@wikimedia.org wrote: Why would it be harmful for public wikis? Anyone can do this on an article-by-article basis by copying the source their own MediaWiki instances. That user would have to pick which articles to copy and test (or test them all). The log doesn't contain (I guess?) all articles. Only slow articles. -Jeremy ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Editing wikipedia using google, openID or facebook
Those tags are arbitrary :( -Chad On Mar 7, 2013 12:09 PM, Petr Bena benap...@gmail.com wrote: ah ok I was confused by it being flagged stable On Thu, Mar 7, 2013 at 8:35 PM, Tyler Romeo tylerro...@gmail.com wrote: On Thu, Mar 7, 2013 at 2:32 PM, Petr Bena benap...@gmail.com wrote: I just discovered this: http://www.mediawiki.org/wiki/Extension:OpenID why we don't have it on production? :) Just last week there was a thread about this. Extension:OpenID is under active development, but I think it could be ready for deployment in the near future (if not right now). *--* *Tyler Romeo* Stevens Institute of Technology, Class of 2015 Major in Computer Science www.whizkidztech.com | tylerro...@gmail.com ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Editing wikipedia using google, openID or facebook
On Thu, Mar 7, 2013 at 12:10 PM, Tyler Romeo tylerro...@gmail.com wrote: On Thu, Mar 7, 2013 at 3:05 PM, Antoine Musso hashar+...@free.fr wrote: We still have to figure out which account will be used, the URL, whether we want a dedicated wiki etc... Those discussions are unrelated to using OpenID as a client, though. As I've mentioned before. I'm the one championing OpenID support on the sites and I have no current plans on enabling OpenID as a consumer. Making authentication changes is difficult. We're focusing on OpenID as a provider and OAuth support right now, and that's way more than enough to try to do this quarter. - Ryan ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Indexing non-text content in LuceneSearch
On 07.03.2013 20:58, Brion Vibber wrote: 3) The indexer code (without plugins) should not know about Wikibase, but it may have hard coded knowledge about JSON. It could have a special indexing mode for JSON, in which the structure is deserialized and traversed, and any values are added to the index (while the keys used in the structure would be ignored). We may still be indexing useless interna from the JSON, but at least there would be a lot fewer false negatives. Indexing structured data could be awesome -- again I think of file metadata as well as wikidata-style stuff. But I'm not sure how easy that'll be. Should probably be in addition to the text indexing, rather than replacing. Indeed, but option 3 is about *blindly* indexing *JSON*. We definitly want indexed structured data, the question is just how to get that into the LSearch infrastructure. -- daniel ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Bug 1542 - Log spam blacklist hits
On 07/03/13 21:03, anubhav agarwal wrote: Hey Chris I was exploring SpamBlaklist Extension. I have some doubts hope you could clear them. Is there any place I can get documentation of Class SpamBlacklist in the file SpamBlacklist_body.php. ? In function filter what does the following variables represent ? $title Title object (includes/Title.php) This is the page where it tried to save. $text Text being saved in the page/section $section Name of the section or '' $editpage EditPage object if EditFilterMerged was called, null otherwise $out A ParserOutput class (actually, this variable name was a bad choice, it looks like a OutputPage), see includes/parser/ParserOutput.php I have understood the following things from the code, please correct me if I am wrong. It extracts the edited text, and parse it to find the links. Actually, it uses the fact that the parser will have processed the links, so in most cases just obtains that information. It then replaces the links which match the whitelist regex, This doesn't make sense as you explain it. It builds a list of links, and replaces whitelisted ones with '', ie. removes whitelisted links from the list. and then checks if there are some links that match the blacklist regex. Yes If the check is greater you return the content matched. Right, $check will be non-0 if the links matched the blacklist. it already enters in the debuglog if it finds a match Yes, but that is a private log. Bug 1542 talks about making that accesible in the wiki. I guess the bug aims at creating a sql table. I was thinking of the following fields to log. Title, Text, User, URLs, IP. I don't understand why you denied it. Because we don't like to publish the IPs *in the wiki*. I think the approach should be to log matches using abusefilter extension if that one is loaded. I concur that it seems too hard to begin with. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Editing wikipedia using google, openID or facebook
Am 07.03.2013 21:09, schrieb Petr Bena: ah ok I was confused by it being flagged stable Yes. It *is* stable, at least since I took over the maintenance a long time ago. This does not say, that it cannot be further improved. Currently I am very busy adding new necessary features to the user interface (preferences), which can already be seen at http://openid-wiki.instance-proxy.wmflabs.org/wiki/ . Some new patches are in the pipe and will be published in the next days. The manual page is fully reflecting the current status. I am always looking for developers who install the extension in their wikis and send us their feedback - and file bug reports if needed. Tom Maintainer of E:OpenID signature.asc Description: OpenPGP digital signature ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Indexing non-text content in LuceneSearch
(1) seems like the right way to go to me too. There may be other ways but puppet/files/lucene/lucene.jobs.sh has a function called import-db() which creates a dump like this: php $MWinstall/common/multiversion/MWScript.php dumpBackup.php $dbname --current $dumpfile Ram On Thu, Mar 7, 2013 at 1:05 PM, Daniel Kinzler dan...@brightbyte.de wrote: On 07.03.2013 20:58, Brion Vibber wrote: 3) The indexer code (without plugins) should not know about Wikibase, but it may have hard coded knowledge about JSON. It could have a special indexing mode for JSON, in which the structure is deserialized and traversed, and any values are added to the index (while the keys used in the structure would be ignored). We may still be indexing useless interna from the JSON, but at least there would be a lot fewer false negatives. Indexing structured data could be awesome -- again I think of file metadata as well as wikidata-style stuff. But I'm not sure how easy that'll be. Should probably be in addition to the text indexing, rather than replacing. Indeed, but option 3 is about *blindly* indexing *JSON*. We definitly want indexed structured data, the question is just how to get that into the LSearch infrastructure. -- daniel ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[Wikitech-l] Redis with SSDs
Interesting article I found about Redis and its poor performance with SSDs as a swap medium. For whoever might be interested. http://antirez.com/news/52 *--* *Tyler Romeo* Stevens Institute of Technology, Class of 2015 Major in Computer Science www.whizkidztech.com | tylerro...@gmail.com ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Bug 1542 - Log spam blacklist hits
On Thu, Mar 7, 2013 at 1:34 PM, Platonides platoni...@gmail.com wrote: On 07/03/13 21:03, anubhav agarwal wrote: Hey Chris I was exploring SpamBlaklist Extension. I have some doubts hope you could clear them. Is there any place I can get documentation of Class SpamBlacklist in the file SpamBlacklist_body.php. ? There really isn't any documentation besides the code, but a couple more things you should look at. Notice that in SpamBlacklist.php, there is the line $wgHooks['EditFilterMerged'][] = 'SpamBlacklistHooks::filterMerged';, which is the way that SpamBlacklist registers itself with MediaWiki core to filter edits. So when MediaWiki core runs the EditFilterMerged hooks (which it does in includes/EditPage.php, line 1287), all of the extensions that have registered a function for that hook are run with the passed in arguments, so SpamBlacklistHooks::filterMerged is run. And SpamBlacklistHooks::filterMerged then just sets up and calls SpamBlacklist::filter. So that is where you can start tracing what is actually in the variables, in case Platonides summary wasn't enough. In function filter what does the following variables represent ? $title Title object (includes/Title.php) This is the page where it tried to save. $text Text being saved in the page/section $section Name of the section or '' $editpage EditPage object if EditFilterMerged was called, null otherwise $out A ParserOutput class (actually, this variable name was a bad choice, it looks like a OutputPage), see includes/parser/ParserOutput.php I have understood the following things from the code, please correct me if I am wrong. It extracts the edited text, and parse it to find the links. Actually, it uses the fact that the parser will have processed the links, so in most cases just obtains that information. It then replaces the links which match the whitelist regex, This doesn't make sense as you explain it. It builds a list of links, and replaces whitelisted ones with '', ie. removes whitelisted links from the list. and then checks if there are some links that match the blacklist regex. Yes If the check is greater you return the content matched. Right, $check will be non-0 if the links matched the blacklist. it already enters in the debuglog if it finds a match Yes, but that is a private log. Bug 1542 talks about making that accesible in the wiki. Yep. For example, see * https://en.wikipedia.org/wiki/Special:Log * https://en.wikipedia.org/wiki/Special:AbuseLog I guess the bug aims at creating a sql table. I was thinking of the following fields to log. Title, Text, User, URLs, IP. I don't understand why you denied it. Because we don't like to publish the IPs *in the wiki*. The WMF privacy policy also discourages us from keeping IP addresses longer than 90 days, so if you do keep IPs, then you need a way to hide / purge them, and if they allow someone to see what IP address a particular username was using, then only users with checkuser permissions are allowed to see that. So it would be easier for you not to include it, but if it's desired, then you'll just have to build those protections out too. I think the approach should be to log matches using abusefilter extension if that one is loaded. The abusefilter log format has a lot of data in it specific to AbuseFilter, and is used to re-test abuse filters, so adding these hits into that log might cause some issues. I think either the general log, or using a separate, new log table would be best. Just for some numbers, in the first 7 days of this month, we've had an average of 27,000 hits each day. So if this goes into an existing log, it's going to generate a significant amount of data. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Query performance - run code faster, merge code faster :-)
On 07/03/13 12:12, Asher Feldman wrote: Ori - I think this has been discussed but automated xhprof configuration as part of the vagrant dev env setup would be amazing :) I don't think xhprof is the best technology for PHP profiling. I reported a bug a month ago which causes the times it reports to be incorrect by a random factor, often 4 or so. No response so far. And its web interface is packed full of XSS vulnerabilities. XDebug + KCacheGrind is quite nice. -- Tim Starling ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Query performance - run code faster, merge code faster :-)
On Thu, Mar 7, 2013 at 3:57 PM, Tim Starling tstarl...@wikimedia.orgwrote: On 07/03/13 12:12, Asher Feldman wrote: Ori - I think this has been discussed but automated xhprof configuration as part of the vagrant dev env setup would be amazing :) I don't think xhprof is the best technology for PHP profiling. I reported a bug a month ago which causes the times it reports to be incorrect by a random factor, often 4 or so. No response so far. And its web interface is packed full of XSS vulnerabilities. XDebug + KCacheGrind is quite nice. That's disappointing, I wonder if xhprof has become abandonware since facebook moved away from zend. Have you looked at Webgrind ( http://code.google.com/p/webgrind/)? If not, I'd love to see it at least get a security review. KCacheGrind is indeed super powerful and nice, and well suited to a dev vm. I'm still interested in this sort of profiling for a very small percentage of production requests though, such as 0.1% of requests hitting a single server. Copying around cachegrind files and using KCacheGrind wouldn't be very practical. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Identifying pages that are slow to render
On 2013-03-07 4:06 PM, Matthew Flaschen mflasc...@wikimedia.org wrote: On 03/07/2013 12:00 PM, Antoine Musso wrote: Le 06/03/13 23:58, Federico Leva (Nemo) a écrit : There's slow-parse.log, but it's private unless a solution is found for https://gerrit.wikimedia.org/r/#/c/49678/ https://wikitech.wikimedia.org/wiki/Logs And slow-parse.log is probably going to be kept private unless proven it is not harmful =) Why would it be harmful for public wikis? Anyone can do this on an article-by-article basis by copying the source their own MediaWiki instances. But it ends up being repeated work. Matt ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l +1 . I have trouble imagining how making this public could be harmful. There are plenty of well known slow to parse pages already. There's also more than a couple of ways to convince mw to make slow queries (longer than the php time limit), we publically release detailed profiling data, etc. Well that sort of thing isnt exactly proclaimed to the world, its also not a secret. If someone wanted to find slow points on mediawiki, theres a lot worse things just floating around the internet than a slow to parse page list. -bawolff ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[Wikitech-l] Extension:OpenID 3.00 - Security Release
*Marc-Andre Pelletier discovered a vulnerability in the MediaWiki OpenID extension for the case that MediaWiki is used as a “provider” and the wiki allows renaming of users. All previous versions of the OpenID extension used user-page URLs as identity URLs. On wikis that use the OpenID extension as “provider” and allows user renames, an attacker with rename privileges could rename a user and could then create an account with the same name as the victim. This would have allowed the attacker to steal the victim’s OpenID identity. Version 3.00 fixes the vulnerability by using Special:OpenIDIdentifier/id as the user’s identity URL, id being the immutable MediaWiki-internal userid of the user. The user’s old identity URL, based on the user’s user-page URL, will no longer be valid. The user’s user page can still be used as OpenID identity URL, but will delegate to the special page. This is a breaking change, as it changes all user identity URLs. Providers are urged to upgrade and notify users, or to disable user renaming. Respectfully, Ryan Lane https://gerrit.wikimedia.org/r/#/c/52722 Commit: f4abe8649c6c37074b5091748d9e2d6e9ed452f2* ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Redis with SSDs
On Thu, Mar 7, 2013 at 2:16 PM, Tyler Romeo tylerro...@gmail.com wrote: Interesting article I found about Redis and its poor performance with SSDs as a swap medium. For whoever might be interested. http://antirez.com/news/52 This was not particularly insightful or useful; Redis swapping is known poor, and swapping to SSD is only slightly less bad than swapping to spinning hard drives. Also, is horrible for SSD longevity. Lots of small random writes throughout the disk? This would be the wrong tool test. -- -george william herbert george.herb...@gmail.com ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Extension:OpenID 3.00 - Security Release
This is indeed a problem but given that rename permissions are granted by default to bureaucrats who are most trusted users, and on small wikis typically sysadmins with shell access, this shouldn't be very dangerous. Sysadmin with shell access will be able to steal your identity anyway. It's a problem in case of large wikis like these on wmf On Fri, Mar 8, 2013 at 2:19 AM, Ryan Lane rlan...@gmail.com wrote: *Marc-Andre Pelletier discovered a vulnerability in the MediaWiki OpenID extension for the case that MediaWiki is used as a “provider” and the wiki allows renaming of users. All previous versions of the OpenID extension used user-page URLs as identity URLs. On wikis that use the OpenID extension as “provider” and allows user renames, an attacker with rename privileges could rename a user and could then create an account with the same name as the victim. This would have allowed the attacker to steal the victim’s OpenID identity. Version 3.00 fixes the vulnerability by using Special:OpenIDIdentifier/id as the user’s identity URL, id being the immutable MediaWiki-internal userid of the user. The user’s old identity URL, based on the user’s user-page URL, will no longer be valid. The user’s user page can still be used as OpenID identity URL, but will delegate to the special page. This is a breaking change, as it changes all user identity URLs. Providers are urged to upgrade and notify users, or to disable user renaming. Respectfully, Ryan Lane https://gerrit.wikimedia.org/r/#/c/52722 Commit: f4abe8649c6c37074b5091748d9e2d6e9ed452f2* ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l