Re: [Wikitech-l] Seemingly proprietary Javascript

2013-03-07 Thread Platonides
On 06/03/13 16:28, Jay Ashworth wrote:
 To “convey” a work means any kind of propagation that enables other
 parties to make or receive copies. Mere interaction with a user
 through a computer network, with no transfer of a copy, is not
 conveying.

 As javascript is executed in the client, it probably is.
 
 Perhaps.  But HTML is also executed in the client, and some legal
 decisions have gone each way on whether the mere viewing of a page 
 constitutes copying in violation of copyright (the trend is towards
 no, thankfully. :-)
 
 Cheers,
 -- jra

Interesting. Although HTML is presentational, while js is executable.

I wouldn't consider most of our javascript as significant -even though
we have plenty of usages considered non-trivial by [1]- since it is
highly based on MediaWiki classes and ids. However, we also have some
big javascript programs (WikiEditor, VisualEditor...)

@Alexander: I would consider something like
 script 
 src=//bits.wikimedia.org/www.mediawiki.org/load.php?debug=falseamp;lang=enamp;modules=jquery%2Cmediawiki%2CSpinner%7Cjquery.triggerQueueCallback%2CloadingSpinner%2CmwEmbedUtil%7Cmw.MwEmbedSupportamp;only=scriptsamp;skin=vectoramp;version=20130304T183632Z
  
 license=//bits.wikimedia.org/www.mediawiki.org/load.php?debug=falseamp;lang=enamp;modules=jquery%2Cmediawiki%2CSpinner%7Cjquery.triggerQueueCallback%2CloadingSpinner%2CmwEmbedUtil%7Cmw.MwEmbedSupportamp;only=scriptsamp;skin=vectoramp;version=20130304T183632Zmode=license/script

with license attribute pointing to a JavaScript License Web Labels page
for that script (yes, that would have to go up to whatwg).

Another, easier, option would be that LibreJS detected the debug=false
in the url and changed it to debug=true, expecting to find the license
information there.
It's also a natural change for people intending to reuse such
javascript, even if they were unaware of such convention.

@Chad: We use free licenses since we care about the freedom of our cde
to be reused, but if the license is not appropiate to what we really
intend, or even worse, is placing such a burden that even us aren't
properly presenting them, it's something very discussion worthy.
Up to the point where we could end up relicensing the code to better
reflect our intention, as it was done from GFDL to CC-BY-SA with
wikipedia content.


1- http://www.gnu.org/philosophy/javascript-trap.html


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] How do MS SQL users install MediaWiki?

2013-03-07 Thread Marcin Cieslak
 Mark A. Hershberger m...@everybody.org wrote:
 On 03/04/2013 01:34 AM, Chad wrote:
 However, we do
 have people who want/use MSSQL, so I think taking the effort to
 keep it working is worthwhile--if someone's willing to commit.

 Since Danny Bauch has been using MSSQL and modifying MW for his needs,
 I'll work with him to get the necessary changes committed.

 Danny, if you could commit your changes into Gerrit, I'd be happy to
 test them.

I'll be happy to come back to my PostgreSQL work and I'd happy to
talk to other RDBMs people to coordinate some stuff (like getting
unit tests to work or getting some abstractions right - transactions,
schema management etc.).

//Saper


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] IRC office hour on Tue March 19th, 1700 UTC, about Bug management

2013-03-07 Thread Andre Klapper
Hi everybody,

on Tuesday 19th 17:00 UTC[1], there will be an IRC Office Hour in
#wikimedia-office about Wikimedia's issue tracker[2] and Bug
management[3]. 

Add it to your calendar and come to ask how to better find information
in Bugzilla that interests you, and to share ideas and criticism how to
make Bugzilla better.

andre

[1] https://meta.wikimedia.org/wiki/IRC_office_hours
[2] https://bugzilla.wikimedia.org
[3] https://www.mediawiki.org/wiki/Bug_management
-- 
Andre Klapper | Wikimedia Bugwrangler
http://blogs.gnome.org/aklapper/


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] Indexing structures for Wikidata

2013-03-07 Thread Denny Vrandečić
As you probably know, the search in Wikidata sucks big time.

Until we have created a proper Solr-based search and deployed on that
infrastructure, we would like to implement and set up a reasonable stopgap
solution.

The simplest and most obvious signal for sorting the items would be to
1) make a prefix search
2) weight all results by the number of Wikipedias it links to

This should usually provide the item you are looking for. Currently, the
search order is random. Good luck with finding items like California,
Wellington, or Berlin.

Now, what I want to ask is, what would be the appropriate index structure
for that table. The data is saved in the wb_terms table, which would need
to be extended by a weight field. There is already a suggestion (based on
discussions between Tim and Daniel K if I understood correctly) to change
the wb_terms table index structure (see here 
https://bugzilla.wikimedia.org/show_bug.cgi?id=45529 ), but since we are
changing the index structure anyway it would be great to get it right this
time.

Anyone who can jump in? (Looking especially at Asher and Tim)

Any help would be appreciated.

Cheers,
Denny

-- 
Project director Wikidata
Wikimedia Deutschland e.V. | Obentrautstr. 72 | 10963 Berlin
Tel. +49-30-219 158 26-0 | http://wikimedia.de

Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e.V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter
der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für
Körperschaften I Berlin, Steuernummer 27/681/51985.
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Query performance - run code faster, merge code faster :-)

2013-03-07 Thread Nischay Nahata
I found EXPLAIN (http://dev.mysql.com/doc/refman/5.0/en/using-explain.html)
pretty useful during my project; rather than theories it shows
the exact way the query is being resolved and if the indexes are being used
rightly.

On Thu, Mar 7, 2013 at 6:06 AM, Sumana Harihareswara
suma...@wikimedia.orgwrote:

 If you want your code merged, you need to keep your database queries
 efficient.  How can you tell if a query is inefficient? How do you write
 efficient queries, and avoid inefficient ones?  We have some resources
 around:

 Roan Kattouw's

 https://www.mediawiki.org/wiki/Manual:Database_layout/MySQL_Optimization/Tutorial
 -- slides at
 https://commons.wikimedia.org/wiki/File:MediaWikiPerformanceProfiling.pdf

 Asher Feldman's
 https://www.mediawiki.org/wiki/File:MediaWiki_Performance_Profiling.ogv
 -- slides at https://www.mediawiki.org/wiki/File:SQL_indexing_Tutorial.pdf

 More hints:
 http://lists.wikimedia.org/pipermail/toolserver-l/2012-June/005075.html

 When you need to ask for a performance review, you can check out
 https://www.mediawiki.org/wiki/Developers/Maintainers#Other_Areas_of_Focus
 which suggests Tim Starling, Asher Feldman, and Ori Livneh.  I also
 BOLDly suggest Nischay Nahata, who worked on Semantic MediaWiki's
 performance for his GSoC project in 2012.

 --
 Sumana Harihareswara
 Engineering Community Manager
 Wikimedia Foundation




-- 
Cheers,

Nischay Nahata
nischayn22.in
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Query performance - run code faster, merge code faster :-)

2013-03-07 Thread Luke Welling WMF
The advice on
https://wikitech.wikimedia.org/wiki/Query_profiling_for_features_developers
sounds
good.

Is there more detail somewhere on how to do this part Test your query
against production slaves prior to full deployment?

Luke


On Wed, Mar 6, 2013 at 8:14 PM, Matthew Flaschen mflasc...@wikimedia.orgwrote:

 On 03/06/2013 04:36 PM, Sumana Harihareswara wrote:
  If you want your code merged, you need to keep your database queries
  efficient.  How can you tell if a query is inefficient? How do you write
  efficient queries, and avoid inefficient ones?  We have some resources
  around:
 
  Roan Kattouw's
 
 https://www.mediawiki.org/wiki/Manual:Database_layout/MySQL_Optimization/Tutorial
  -- slides at
 
 https://commons.wikimedia.org/wiki/File:MediaWikiPerformanceProfiling.pdf
 
  Asher Feldman's
  https://www.mediawiki.org/wiki/File:MediaWiki_Performance_Profiling.ogv
  -- slides at
 https://www.mediawiki.org/wiki/File:SQL_indexing_Tutorial.pdf

 And
 https://wikitech.wikimedia.org/wiki/Query_profiling_for_features_developers

 Matt Flaschen

 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Replicating enwiki and dewiki for research purposes

2013-03-07 Thread Andreas Nüßlein
Hey Quim, hey Maria,

thank you for your replies!
I actually knew where to find the XML-dumps but that pointer about the new
XML-import tools is really helpful.


So eventually, I was able to acquire a Xeon 8 core, 32GB RAM, 6TB SAS  to
start my experiments on :)
Let's see what this baby can do * http://i.imgur.com/J47GJ.gif *

Thanks again
Andreas



On Tue, Mar 5, 2013 at 3:33 PM, Maria Miteva mariya.mit...@gmail.comwrote:

 Hi,

 You might also try the following mailing list:
 * XML Data Dumps mailing
 listhttps://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l
  *

 Here is some info on importing XML dumps ( not sure what tools work well
 but probably the mailing list can help with that)
 http://meta.wikimedia.org/wiki/Data_dumps/Tools_for_importing

 Also, Ariel Glenn recently announced two new tools for importing dumps on
 the XML list:

 http://lists.wikimedia.org/pipermail/xmldatadumps-l/2013-February/000701.html

 Mariya



 On Tue, Mar 5, 2013 at 4:15 PM, Quim Gil q...@wikimedia.org wrote:

  On 03/05/2013 02:54 AM, Andreas Nüßlein wrote:
 
  Hi list,
 
  so I need to set up a local instance of the dewiki- and enwiki-DB with
 all
  revisions.. :-D
 
 
  Just in case:
  http://meta.wikimedia.org/**wiki/Mirroring_Wikimedia_**project_XML_dumps
 http://meta.wikimedia.org/wiki/Mirroring_Wikimedia_project_XML_dumps
 
  Also, you might want to ask / discuss at
 
  https://lists.wikimedia.org/**mailman/listinfo/offline-l
 https://lists.wikimedia.org/mailman/listinfo/offline-l
 
  Good luck with this interesting project!
 
  --
  Quim Gil
  Technical Contributor Coordinator @ Wikimedia Foundation
  http://www.mediawiki.org/wiki/**User:Qgil
 http://www.mediawiki.org/wiki/User:Qgil
 
 
  __**_
  Wikitech-l mailing list
  Wikitech-l@lists.wikimedia.org
  https://lists.wikimedia.org/**mailman/listinfo/wikitech-l
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l
 
 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] Editing wikipedia using google, openID or facebook

2013-03-07 Thread Petr Bena
Hi,

we discussed OAuth many times... but - what's the current status?

Do we have working extensions which support using OpenID in order to
login to mediawiki, or OAuth? So that you can login using your google
account or such? I believe that WMF is working on this, so can we have
some update?

I know that english wikipedia community hates facebook and basically
anything new :P but if not wikipedia at least many small wikis could
use it.

Thanks

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Editing wikipedia using google, openID or facebook

2013-03-07 Thread Petr Bena
I just discovered this: http://www.mediawiki.org/wiki/Extension:OpenID
why we don't have it on production? :)

On Thu, Mar 7, 2013 at 8:30 PM, Petr Bena benap...@gmail.com wrote:
 Hi,

 we discussed OAuth many times... but - what's the current status?

 Do we have working extensions which support using OpenID in order to
 login to mediawiki, or OAuth? So that you can login using your google
 account or such? I believe that WMF is working on this, so can we have
 some update?

 I know that english wikipedia community hates facebook and basically
 anything new :P but if not wikipedia at least many small wikis could
 use it.

 Thanks

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Editing wikipedia using google, openID or facebook

2013-03-07 Thread Tyler Romeo
On Thu, Mar 7, 2013 at 2:32 PM, Petr Bena benap...@gmail.com wrote:

 I just discovered this: http://www.mediawiki.org/wiki/Extension:OpenID


 why we don't have it on production? :)


Just last week there was a thread about this. Extension:OpenID is under
active development, but I think it could be ready for deployment in the
near future (if not right now).

*--*
*Tyler Romeo*
Stevens Institute of Technology, Class of 2015
Major in Computer Science
www.whizkidztech.com | tylerro...@gmail.com
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Replicating enwiki and dewiki for research purposes

2013-03-07 Thread MZMcBride
Andreas Nüßlein wrote:
so I need to set up a local instance of the dewiki- and enwiki-DB with all
revisions.. :-D

I know it's rather a mammoth project so I was wondering if somebody could
give me some pointers?

First of all, I would need to know what kind of hardware I should get. Is
it possible/smart to have it all in two ginormous MySQL-Instance (one for
each of the languages) or will I need to do sharding?

I don't need it to run smoothly. I only need to be able to query the
database (and I know some of these queries can run for days)

I will probably have access to some rather powerful machines here at the
university and I have also quite a few workstation-machines on which I
could theoretically do the sharding.

Ryan L. or Marc P.: I routed Andreas to this list (from
#wikimedia-toolserver), as I figured these questions related to the work
that you all have been doing for Wikimedia Labs. Or at least I figured you
all probably had some kind of formula for hardware provisioning that might
be reusable here. Any pointers would be great. :-)

MZMcBride



___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] Indexing non-text content in LuceneSearch

2013-03-07 Thread Daniel Kinzler
Hi all!

I would like to ask for you input on the question how non-wikitext content can
be indexed by LuceneSearch.

Background is the fact that full text search (Special:Search) is nearly useless
on wikidata.org at the moment, see
https://bugzilla.wikimedia.org/show_bug.cgi?id=42234.

The reason for the problem appears to be that when rebuilding a Lucene index
from scratch, using an XML dump of wikidata.org, the raw JSON structure used by
Wikibase gets indexed. The indexer is blind, it just takes whatever text it
finds in the dump. Indexing JSON does not work at all for fulltext search,
especially not when non-ascii characters are represented as unicode escape
sequences.

Inside MediaWiki, in PHP, this work like this:

* wikidata.org (or rather, the Wikibase extension) stores non-text content in
wiki pages, using a ContentHandler that manages a JSON structure.
* Wikibase's EntityContent class implements Content::getTextForSearchIndex() so
it returns the labels and aliases of an entity. Data items thus get indexed by
their labels and aliases.
* getTextForSearchIndex() is used by the default MySQL search to build an index.
It's also (ab)used by things that can only operate on flat text, like the
AbuseFilter extension.
* The LuceneSearch index gets updated live using the OAI extension, which in
turn knows to use getTextForSearchIndex() to get the text for indexing.

So, for anything indexed live, this works, but for rebuilding the search index
from a dump, it doesn't - because the Java indexer knows nothing about content
types, and has no interface for an extension to register additional content 
types.


To improve this, I can think of a few options:

1) create a specialized XML dump that contains the text generated by
getTextForSearchIndex() instead of actual page content. However, that only works
if the dump is created using the PHP dumper. How are the regular dumps currently
generated on WMF infrastructure? Also, would be be feasible to make an extra
dump just for LuceneSearch (at least for wikidata.org)?

2) We could re-implement the ContentHandler facility in Java, and require
extensions that define their own content types to provide a Java based handler
in addition to the PHP one. That seems like a pretty massive undertaking of
dubious value. But it would allow maximum control over what is indexed how.

3) The indexer code (without plugins) should not know about Wikibase, but it may
have hard coded knowledge about JSON. It could have a special indexing mode for
JSON, in which the structure is deserialized and traversed, and any values are
added to the index (while the keys used in the structure would be ignored). We
may still be indexing useless interna from the JSON, but at least there would be
a lot fewer false negatives.


I personally would prefer 1) if dumps are created with PHP, and 3) otherwise. 2)
looks nice, but is hard to keep the Java and the PHP version from diverging.

So, how would you fix this?

thanks
daniel

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Indexing non-text content in LuceneSearch

2013-03-07 Thread Brion Vibber
On Thu, Mar 7, 2013 at 11:45 AM, Daniel Kinzler dan...@brightbyte.de wrote:
 1) create a specialized XML dump that contains the text generated by
 getTextForSearchIndex() instead of actual page content.

That probably makes the most sense; alternately, make a dump that
includes both raw data and text for search. This also allows for
indexing extra stuff for files -- such as extracted text from a PDF of
DjVu or metadata from a JPEG -- if the dump process etc can produce
appropriate indexable data.

 However, that only works
 if the dump is created using the PHP dumper. How are the regular dumps 
 currently
 generated on WMF infrastructure? Also, would be be feasible to make an extra
 dump just for LuceneSearch (at least for wikidata.org)?

The dumps are indeed created via MediaWiki. I think Ariel or someone
can comment with more detail on how it currently runs, it's been a
while since I was in the thick of it.

 2) We could re-implement the ContentHandler facility in Java, and require
 extensions that define their own content types to provide a Java based handler
 in addition to the PHP one. That seems like a pretty massive undertaking of
 dubious value. But it would allow maximum control over what is indexed how.

No don't do it :)

 3) The indexer code (without plugins) should not know about Wikibase, but it 
 may
 have hard coded knowledge about JSON. It could have a special indexing mode 
 for
 JSON, in which the structure is deserialized and traversed, and any values are
 added to the index (while the keys used in the structure would be ignored). We
 may still be indexing useless interna from the JSON, but at least there would 
 be
 a lot fewer false negatives.

Indexing structured data could be awesome -- again I think of file
metadata as well as wikidata-style stuff. But I'm not sure how easy
that'll be. Should probably be in addition to the text indexing,
rather than replacing.


-- brion

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Identifying pages that are slow to render

2013-03-07 Thread Antoine Musso
Le 06/03/13 23:58, Federico Leva (Nemo) a écrit :
 There's slow-parse.log, but it's private unless a solution is found for
 https://gerrit.wikimedia.org/r/#/c/49678/
 https://wikitech.wikimedia.org/wiki/Logs

And slow-parse.log is probably going to be kept private unless proven it
is not harmful =)

-- 
Antoine hashar Musso

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Bug 1542 - Log spam blacklist hits

2013-03-07 Thread anubhav agarwal
Hey Chris

I was exploring SpamBlaklist Extension. I have some doubts hope you could
clear them.

Is there any place I can get documentation of
Class SpamBlacklist in the file SpamBlacklist_body.php. ?

In function filter what does the following variables represent ?

$title
$text
$section
$editpage
$out

I have understood the following things from the code, please correct me if
I am wrong.
It extracts the edited text, and parse it to find the links. It then
replaces the links which match the whitelist regex, and then checks if
there are some links that match the blacklist regex.
If the check is greater you return the content matched. it already enters
in the debuglog if it finds a match

I guess the bug aims at creating a sql table.
I was thinking of the following fields to log.
Title, Text, User, URLs, IP. I don't understand why you denied it.


On Tue, Feb 26, 2013 at 1:25 AM, Chris Steipp cste...@wikimedia.org wrote:

 That's an ambitious first bug, Anubhav!

 Since this is an extension, it plugs into MediaWiki core using hooks.
 So periodically, the core code will run all of the functions
 registered for a particular hook, so the extensions can interact with
 the logic. In this case, SpamBlacklist has registered
 SpamBlacklistHooks::filterMerged to run whenever an editor attempts to
 save a page, or SpamBlacklistHooks::filterAPIEditBeforeSave if the
 edit came in through the api. So that is where you will want to log.

 Although MediaWiki has a logging feature, it sounds like you may want
 to add your own logging table (like the AbuseFilter extension). If you
 do that, make sure that you're only storing data that you really need,
 and is ok with our privacy policy (so no ip addresses!).

 Feel free to add me as a reviewer when you submit your code to gerrit.

 Chris

 On Mon, Feb 25, 2013 at 11:21 AM, Tyler Romeo tylerro...@gmail.com
 wrote:
  Hey,
 
  I don't know much about that, or how much you know, but at the very
 least I
  can tell you that the bug is in Extension:SpamBlacklist, which can be
 found
  at http://www.mediawiki.org/wiki/Extension:SpamBlacklist. From what I
 can
  see from the code, it seems to just use various Hooks in MediaWiki in
 order
  to stop editing, e-mailing, etc. if the request matches a parsed
 blacklist
  it has.
 
  *--*
  *Tyler Romeo*
  Stevens Institute of Technology, Class of 2015
  Major in Computer Science
  www.whizkidztech.com | tylerro...@gmail.com
 
 
  On Mon, Feb 25, 2013 at 2:17 PM, anubhav agarwal anubhav...@gmail.com
 wrote:
 
  Hi Guys,
 
  I was trying to fix
  thishttps://bugzilla.wikimedia.org/show_bug.cgi?id=1542bug. I am a
  newbie to mediawiki and it's a first bug I'm trying to solve,
  so I don't know much.
  I want to know about the spam block list, how does it works, how does
  trigger the action, and its logging mechanism.
  It would be great if some one could help me fix this bug.
 
  Cheers,
  Anubhav
 
 
  Anubhav Agarwal| 4rth Year  | Computer Science  Engineering | IIT
 Roorkee
  ___
  Wikitech-l mailing list
  Wikitech-l@lists.wikimedia.org
  https://lists.wikimedia.org/mailman/listinfo/wikitech-l
  ___
  Wikitech-l mailing list
  Wikitech-l@lists.wikimedia.org
  https://lists.wikimedia.org/mailman/listinfo/wikitech-l

 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l




-- 
Cheers,
Anubhav


Anubhav Agarwal| 4rth Year  | Computer Science  Engineering | IIT Roorkee
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Identifying pages that are slow to render

2013-03-07 Thread Antoine Musso
Le 06/03/13 22:05, Robert Rohde a écrit :
 On enwiki we've already made Lua conversions with most of the string
 templates, several formatting templates (e.g. {{rnd}}, {{precision}}),
 {{coord}}, and a number of others.  And there is work underway on a
 number of the more complex overhauls (e.g. {{cite}}, {{convert}}).
 However, it would be nice to identify problematic templates that may
 be less obvious.

You can get in touch with Brad Jorsch and Tim Starling. They most
probably have a list of templates that should quickly converted to LUA
modules.

If we got {{cite}} out, that will be already a nice improvement :-]

-- 
Antoine hashar Musso


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Editing wikipedia using google, openID or facebook

2013-03-07 Thread Antoine Musso
Le 07/03/13 11:32, Petr Bena wrote:
 I just discovered this: http://www.mediawiki.org/wiki/Extension:OpenID
 why we don't have it on production? :)

As far as I know, that extension is pending a full review before it
lands on the Wikimedia cluster.

Ryan Lane wrote about it:
 http://lists.wikimedia.org/pipermail/wikitech-l/2013-March/067124.html

There is a draft document at:
 https://www.mediawiki.org/wiki/OpenID_Provider

We still have to figure out which account will be used, the URL, whether
we want a dedicated wiki etc...


-- 
Antoine hashar Musso

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Identifying pages that are slow to render

2013-03-07 Thread Matthew Flaschen
On 03/07/2013 12:00 PM, Antoine Musso wrote:
 Le 06/03/13 23:58, Federico Leva (Nemo) a écrit :
 There's slow-parse.log, but it's private unless a solution is found for
 https://gerrit.wikimedia.org/r/#/c/49678/
 https://wikitech.wikimedia.org/wiki/Logs
 
 And slow-parse.log is probably going to be kept private unless proven it
 is not harmful =)

Why would it be harmful for public wikis?  Anyone can do this on an
article-by-article basis by copying the source their own MediaWiki
instances.

But it ends up being repeated work.

Matt

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Editing wikipedia using google, openID or facebook

2013-03-07 Thread Petr Bena
ah ok I was confused by it being flagged stable

On Thu, Mar 7, 2013 at 8:35 PM, Tyler Romeo tylerro...@gmail.com wrote:
 On Thu, Mar 7, 2013 at 2:32 PM, Petr Bena benap...@gmail.com wrote:

 I just discovered this: http://www.mediawiki.org/wiki/Extension:OpenID


 why we don't have it on production? :)


 Just last week there was a thread about this. Extension:OpenID is under
 active development, but I think it could be ready for deployment in the
 near future (if not right now).

 *--*
 *Tyler Romeo*
 Stevens Institute of Technology, Class of 2015
 Major in Computer Science
 www.whizkidztech.com | tylerro...@gmail.com
 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Editing wikipedia using google, openID or facebook

2013-03-07 Thread Tyler Romeo
On Thu, Mar 7, 2013 at 3:05 PM, Antoine Musso hashar+...@free.fr wrote:

 We still have to figure out which account will be used, the URL, whether
 we want a dedicated wiki etc...


Those discussions are unrelated to using OpenID as a client, though.

*--*
*Tyler Romeo*
Stevens Institute of Technology, Class of 2015
Major in Computer Science
www.whizkidztech.com | tylerro...@gmail.com
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Identifying pages that are slow to render

2013-03-07 Thread Jeremy Baron
On Thu, Mar 7, 2013 at 8:06 PM, Matthew Flaschen
mflasc...@wikimedia.org wrote:
 Why would it be harmful for public wikis?  Anyone can do this on an
 article-by-article basis by copying the source their own MediaWiki
 instances.

That user would have to pick which articles to copy and test (or test them all).

The log doesn't contain (I guess?) all articles. Only slow articles.

-Jeremy

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Editing wikipedia using google, openID or facebook

2013-03-07 Thread Chad
Those tags are arbitrary :(

-Chad
On Mar 7, 2013 12:09 PM, Petr Bena benap...@gmail.com wrote:

 ah ok I was confused by it being flagged stable

 On Thu, Mar 7, 2013 at 8:35 PM, Tyler Romeo tylerro...@gmail.com wrote:
  On Thu, Mar 7, 2013 at 2:32 PM, Petr Bena benap...@gmail.com wrote:
 
  I just discovered this: http://www.mediawiki.org/wiki/Extension:OpenID
 
 
  why we don't have it on production? :)
 
 
  Just last week there was a thread about this. Extension:OpenID is under
  active development, but I think it could be ready for deployment in the
  near future (if not right now).
 
  *--*
  *Tyler Romeo*
  Stevens Institute of Technology, Class of 2015
  Major in Computer Science
  www.whizkidztech.com | tylerro...@gmail.com
  ___
  Wikitech-l mailing list
  Wikitech-l@lists.wikimedia.org
  https://lists.wikimedia.org/mailman/listinfo/wikitech-l

 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Editing wikipedia using google, openID or facebook

2013-03-07 Thread Ryan Lane
On Thu, Mar 7, 2013 at 12:10 PM, Tyler Romeo tylerro...@gmail.com wrote:

 On Thu, Mar 7, 2013 at 3:05 PM, Antoine Musso hashar+...@free.fr wrote:

  We still have to figure out which account will be used, the URL, whether
  we want a dedicated wiki etc...
 

 Those discussions are unrelated to using OpenID as a client, though.


As I've mentioned before. I'm the one championing OpenID support on the
sites and I have no current plans on enabling OpenID as a consumer. Making
authentication changes is difficult. We're focusing on OpenID as a provider
and OAuth support right now, and that's way more than enough to try to do
this quarter.

- Ryan
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Indexing non-text content in LuceneSearch

2013-03-07 Thread Daniel Kinzler
On 07.03.2013 20:58, Brion Vibber wrote:
 3) The indexer code (without plugins) should not know about Wikibase, but it 
 may
 have hard coded knowledge about JSON. It could have a special indexing mode 
 for
 JSON, in which the structure is deserialized and traversed, and any values 
 are
 added to the index (while the keys used in the structure would be ignored). 
 We
 may still be indexing useless interna from the JSON, but at least there 
 would be
 a lot fewer false negatives.
 
 Indexing structured data could be awesome -- again I think of file
 metadata as well as wikidata-style stuff. But I'm not sure how easy
 that'll be. Should probably be in addition to the text indexing,
 rather than replacing.

Indeed, but option 3 is about *blindly* indexing *JSON*. We definitly want
indexed structured data, the question is just how to get that into the LSearch
infrastructure.

-- daniel

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Bug 1542 - Log spam blacklist hits

2013-03-07 Thread Platonides
On 07/03/13 21:03, anubhav agarwal wrote:
 Hey Chris
 
 I was exploring SpamBlaklist Extension. I have some doubts hope you could
 clear them.
 
 Is there any place I can get documentation of
 Class SpamBlacklist in the file SpamBlacklist_body.php. ?
 
 In function filter what does the following variables represent ?
 
 $title
Title object (includes/Title.php) This is the page where it tried to save.

 $text
Text being saved in the page/section

 $section
Name of the section or ''

 $editpage
EditPage object if EditFilterMerged was called, null otherwise

 $out

A ParserOutput class (actually, this variable name was a bad choice, it
looks like a OutputPage), see includes/parser/ParserOutput.php


 I have understood the following things from the code, please correct me if
 I am wrong. It extracts the edited text, and parse it to find the links.

Actually, it uses the fact that the parser will have processed the
links, so in most cases just obtains that information.


 It then replaces the links which match the whitelist regex, 
This doesn't make sense as you explain it. It builds a list of links,
and replaces whitelisted ones with '', ie. removes whitelisted links
from the list.

 and then checks if there are some links that match the blacklist regex.
Yes

 If the check is greater you return the content matched. 

Right, $check will be non-0 if the links matched the blacklist.

 it already enters in the debuglog if it finds a match

Yes, but that is a private log.
Bug 1542 talks about making that accesible in the wiki.


 I guess the bug aims at creating a sql table.
 I was thinking of the following fields to log.
 Title, Text, User, URLs, IP. I don't understand why you denied it.

Because we don't like to publish the IPs *in the wiki*.

I think the approach should be to log matches using abusefilter
extension if that one is loaded.
I concur that it seems too hard to begin with.


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Editing wikipedia using google, openID or facebook

2013-03-07 Thread Thomas Gries
Am 07.03.2013 21:09, schrieb Petr Bena:
 ah ok I was confused by it being flagged stable


Yes. It *is* stable, at least since I took over the maintenance a long
time ago.

This does not say, that it cannot be further improved.

Currently I am very busy adding new necessary features to the user
interface (preferences),
which can already be seen at
http://openid-wiki.instance-proxy.wmflabs.org/wiki/ .
Some new patches are in the pipe and will be published in the next days.

The manual page is fully reflecting the current status.
I am always looking for developers who install the extension in their
wikis and send us their feedback - and file bug reports if needed.

Tom
Maintainer of E:OpenID



signature.asc
Description: OpenPGP digital signature
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Indexing non-text content in LuceneSearch

2013-03-07 Thread Munagala Ramanath
(1) seems like the right way to go to me too.

There may be other ways but puppet/files/lucene/lucene.jobs.sh has a
function called
import-db() which creates a dump like this:

   php $MWinstall/common/multiversion/MWScript.php dumpBackup.php $dbname
--current  $dumpfile

Ram


On Thu, Mar 7, 2013 at 1:05 PM, Daniel Kinzler dan...@brightbyte.de wrote:

 On 07.03.2013 20:58, Brion Vibber wrote:
  3) The indexer code (without plugins) should not know about Wikibase,
 but it may
  have hard coded knowledge about JSON. It could have a special indexing
 mode for
  JSON, in which the structure is deserialized and traversed, and any
 values are
  added to the index (while the keys used in the structure would be
 ignored). We
  may still be indexing useless interna from the JSON, but at least there
 would be
  a lot fewer false negatives.
 
  Indexing structured data could be awesome -- again I think of file
  metadata as well as wikidata-style stuff. But I'm not sure how easy
  that'll be. Should probably be in addition to the text indexing,
  rather than replacing.

 Indeed, but option 3 is about *blindly* indexing *JSON*. We definitly want
 indexed structured data, the question is just how to get that into the
 LSearch
 infrastructure.

 -- daniel

 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] Redis with SSDs

2013-03-07 Thread Tyler Romeo
Interesting article I found about Redis and its poor performance with SSDs
as a swap medium. For whoever might be interested.

http://antirez.com/news/52
*--*
*Tyler Romeo*
Stevens Institute of Technology, Class of 2015
Major in Computer Science
www.whizkidztech.com | tylerro...@gmail.com
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Bug 1542 - Log spam blacklist hits

2013-03-07 Thread Chris Steipp
On Thu, Mar 7, 2013 at 1:34 PM, Platonides platoni...@gmail.com wrote:
 On 07/03/13 21:03, anubhav agarwal wrote:
 Hey Chris

 I was exploring SpamBlaklist Extension. I have some doubts hope you could
 clear them.

 Is there any place I can get documentation of
 Class SpamBlacklist in the file SpamBlacklist_body.php. ?

There really isn't any documentation besides the code, but a couple
more things you should look at. Notice that in SpamBlacklist.php,
there is the line $wgHooks['EditFilterMerged'][] =
'SpamBlacklistHooks::filterMerged';, which is the way that
SpamBlacklist registers itself with MediaWiki core to filter edits. So
when MediaWiki core runs the EditFilterMerged hooks (which it does in
includes/EditPage.php, line 1287), all of the extensions that have
registered a function for that hook are run with the passed in
arguments, so SpamBlacklistHooks::filterMerged is run. And
SpamBlacklistHooks::filterMerged then just sets up and calls
SpamBlacklist::filter. So that is where you can start tracing what is
actually in the variables, in case Platonides summary wasn't enough.



 In function filter what does the following variables represent ?

 $title
 Title object (includes/Title.php) This is the page where it tried to save.

 $text
 Text being saved in the page/section

 $section
 Name of the section or ''

 $editpage
 EditPage object if EditFilterMerged was called, null otherwise

 $out

 A ParserOutput class (actually, this variable name was a bad choice, it
 looks like a OutputPage), see includes/parser/ParserOutput.php


 I have understood the following things from the code, please correct me if
 I am wrong. It extracts the edited text, and parse it to find the links.

 Actually, it uses the fact that the parser will have processed the
 links, so in most cases just obtains that information.


 It then replaces the links which match the whitelist regex,
 This doesn't make sense as you explain it. It builds a list of links,
 and replaces whitelisted ones with '', ie. removes whitelisted links
 from the list.

 and then checks if there are some links that match the blacklist regex.
 Yes

 If the check is greater you return the content matched.

 Right, $check will be non-0 if the links matched the blacklist.

 it already enters in the debuglog if it finds a match

 Yes, but that is a private log.
 Bug 1542 talks about making that accesible in the wiki.

Yep. For example, see
* https://en.wikipedia.org/wiki/Special:Log
* https://en.wikipedia.org/wiki/Special:AbuseLog



 I guess the bug aims at creating a sql table.
 I was thinking of the following fields to log.
 Title, Text, User, URLs, IP. I don't understand why you denied it.

 Because we don't like to publish the IPs *in the wiki*.

The WMF privacy policy also discourages us from keeping IP addresses
longer than 90 days, so if you do keep IPs, then you need a way to
hide / purge them, and if they allow someone to see what IP address a
particular username was using, then only users with checkuser
permissions are allowed to see that. So it would be easier for you not
to include it, but if it's desired, then you'll just have to build
those protections out too.


 I think the approach should be to log matches using abusefilter
 extension if that one is loaded.

The abusefilter log format has a lot of data in it specific to
AbuseFilter, and is used to re-test abuse filters, so adding these
hits into that log might cause some issues. I think either the general
log, or using a separate, new log table would be best. Just for some
numbers, in the first 7 days of this month, we've had an average of
27,000 hits each day. So if this goes into an existing log, it's going
to generate a significant amount of data.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Query performance - run code faster, merge code faster :-)

2013-03-07 Thread Tim Starling
On 07/03/13 12:12, Asher Feldman wrote:
 Ori - I think this has been discussed but automated xhprof configuration as
 part of the vagrant dev env setup would be amazing :)

I don't think xhprof is the best technology for PHP profiling. I
reported a bug a month ago which causes the times it reports to be
incorrect by a random factor, often 4 or so. No response so far. And
its web interface is packed full of XSS vulnerabilities. XDebug +
KCacheGrind is quite nice.

-- Tim Starling



___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Query performance - run code faster, merge code faster :-)

2013-03-07 Thread Asher Feldman
On Thu, Mar 7, 2013 at 3:57 PM, Tim Starling tstarl...@wikimedia.orgwrote:

 On 07/03/13 12:12, Asher Feldman wrote:
  Ori - I think this has been discussed but automated xhprof configuration
 as
  part of the vagrant dev env setup would be amazing :)

 I don't think xhprof is the best technology for PHP profiling. I
 reported a bug a month ago which causes the times it reports to be
 incorrect by a random factor, often 4 or so. No response so far. And
 its web interface is packed full of XSS vulnerabilities. XDebug +
 KCacheGrind is quite nice.


That's disappointing, I wonder if xhprof has become abandonware since
facebook moved away from zend.  Have you looked at Webgrind (
http://code.google.com/p/webgrind/)?  If not, I'd love to see it at least
get a security review.  KCacheGrind is indeed super powerful and nice, and
well suited to a dev vm.  I'm still interested in this sort of profiling
for a very small percentage of production requests though, such as 0.1% of
requests hitting a single server.  Copying around cachegrind files and
using KCacheGrind wouldn't be very practical.
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Identifying pages that are slow to render

2013-03-07 Thread bawolff
On 2013-03-07 4:06 PM, Matthew Flaschen mflasc...@wikimedia.org wrote:

 On 03/07/2013 12:00 PM, Antoine Musso wrote:
  Le 06/03/13 23:58, Federico Leva (Nemo) a écrit :
  There's slow-parse.log, but it's private unless a solution is found for
  https://gerrit.wikimedia.org/r/#/c/49678/
  https://wikitech.wikimedia.org/wiki/Logs
 
  And slow-parse.log is probably going to be kept private unless proven it
  is not harmful =)

 Why would it be harmful for public wikis?  Anyone can do this on an
 article-by-article basis by copying the source their own MediaWiki
 instances.

 But it ends up being repeated work.

 Matt

 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

+1 . I have trouble imagining how making this public could be harmful.
There are plenty of well known slow to parse pages already. There's also
more than a couple of ways to convince mw to make slow queries (longer than
the php time limit), we publically release detailed profiling data, etc.
Well that sort of thing isnt exactly proclaimed to the world, its also not
a secret. If someone wanted to find slow points on mediawiki, theres a lot
worse things just floating around the internet than a slow to parse page
list.

-bawolff
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] Extension:OpenID 3.00 - Security Release

2013-03-07 Thread Ryan Lane
*Marc-Andre Pelletier discovered a vulnerability in the MediaWiki OpenID
extension for the case that MediaWiki is used as a “provider” and the wiki
allows renaming of users.

All previous versions of the OpenID extension used user-page URLs as
identity URLs. On wikis that use the OpenID extension as “provider” and
allows user renames, an attacker with rename privileges could rename a user
and could then create an account with the same name as the victim. This
would have allowed the attacker to steal the victim’s OpenID identity.

Version 3.00 fixes the vulnerability by using Special:OpenIDIdentifier/id
as the user’s identity URL, id being the immutable MediaWiki-internal
userid of the user. The user’s old identity URL, based on the user’s
user-page URL, will no longer be valid.

The user’s user page can still be used as OpenID identity URL, but will
delegate to the special page.

This is a breaking change, as it changes all user identity URLs. Providers
are urged to upgrade and notify users, or to disable user renaming.

Respectfully,

Ryan Lane

https://gerrit.wikimedia.org/r/#/c/52722
Commit: f4abe8649c6c37074b5091748d9e2d6e9ed452f2*
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Redis with SSDs

2013-03-07 Thread George Herbert
On Thu, Mar 7, 2013 at 2:16 PM, Tyler Romeo tylerro...@gmail.com wrote:
 Interesting article I found about Redis and its poor performance with SSDs
 as a swap medium. For whoever might be interested.

 http://antirez.com/news/52

This was not particularly insightful or useful; Redis swapping is
known poor, and swapping to SSD is only slightly less bad than
swapping to spinning hard drives.

Also, is horrible for SSD longevity.  Lots of small random writes
throughout the disk?

This would be the wrong tool test.


-- 
-george william herbert
george.herb...@gmail.com

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Extension:OpenID 3.00 - Security Release

2013-03-07 Thread Petr Bena
This is indeed a problem but given that rename permissions are granted
by default to bureaucrats who are most trusted users, and on small
wikis typically sysadmins with shell access, this shouldn't be very
dangerous. Sysadmin with shell access will be able to steal your
identity anyway.

It's a problem in case of large wikis like these on wmf

On Fri, Mar 8, 2013 at 2:19 AM, Ryan Lane rlan...@gmail.com wrote:
 *Marc-Andre Pelletier discovered a vulnerability in the MediaWiki OpenID
 extension for the case that MediaWiki is used as a “provider” and the wiki
 allows renaming of users.

 All previous versions of the OpenID extension used user-page URLs as
 identity URLs. On wikis that use the OpenID extension as “provider” and
 allows user renames, an attacker with rename privileges could rename a user
 and could then create an account with the same name as the victim. This
 would have allowed the attacker to steal the victim’s OpenID identity.

 Version 3.00 fixes the vulnerability by using Special:OpenIDIdentifier/id
 as the user’s identity URL, id being the immutable MediaWiki-internal
 userid of the user. The user’s old identity URL, based on the user’s
 user-page URL, will no longer be valid.

 The user’s user page can still be used as OpenID identity URL, but will
 delegate to the special page.

 This is a breaking change, as it changes all user identity URLs. Providers
 are urged to upgrade and notify users, or to disable user renaming.

 Respectfully,

 Ryan Lane

 https://gerrit.wikimedia.org/r/#/c/52722
 Commit: f4abe8649c6c37074b5091748d9e2d6e9ed452f2*
 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l