[Wikitech-l] StringFunctions/ParserFunctions #pos return value changed

2009-06-04 Thread H. Langos
Seems like (at least) the API of #pos in ParserFunctions is
different from the one in StringFunctions.

{{#pos: haysack|needle|offset}}

While the StringFunctions #pos in MediaWiki 1.14 returned an
empty string when the needle was not found, the ParserFunctions
implementation of #pos in svn now returns -1.

This is most unfortunate since current usage depends on this.
Example:

{{#if: {{#pos: abcd|b}} | found | not found }}

{{#if: {{#pos: abcd|x}} | found | not found }}

Now both of these example will return found!


Usage scenario:

I try to use #pos in template calls to implement a sort-of-database
functionality in a mediawiki.

I have a big template that contains data in named parameters.
those parameters get passed along to a template that can select columns
by rendering some of those named parameters and ignoring others.

Now I want to implement row selection by passing along a parameter name
and a substring that should be in the value of that parameter in order
for the data to be rendered.

something like this:

{{#if: {{#pos: {{{ {{{selectionattribute}}} }}} | {{{selectionvalue}}} }} | 
render_row | render_nothing }}

If I want this to work in different MediaWiki installations I need
to rely on the API of #pos.

Currently there is seems to be no way to use #pos in a way that works 
on 1.14 and on 1.15-svn.

cheers
-henrik


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Google web bugs in Mediawiki js from admins - technical workarounds?

2009-06-04 Thread Daniel Kinzler
David Gerard schrieb:
 Keeping well-meaning admins from putting Google web bugs in the
 JavaScript is a game of whack-a-mole.
 
 Are there any technical workarounds feasible? If not blocking the
 loading of external sites entirely (I understand hu:wp uses a web bug
 that isn't Google), perhaps at least listing the sites somewhere
 centrally viewable?

Perhaps the solution would be to simply set up our own JS based usage tracker?
There are a few options available
http://en.wikipedia.org/wiki/List_of_web_analytics_software, and for starters,
the backend could run on the toolserver.

Note that anything processing IP addresses will need special approval on the TS.

-- daniel

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] StringFunctions/ParserFunctions #pos return value changed

2009-06-04 Thread H. Langos
On Thu, Jun 04, 2009 at 03:55:38PM +0200, H. Langos wrote:
 Seems like (at least) the API of #pos in ParserFunctions is
 different from the one in StringFunctions.
 
 {{#pos: haysack|needle|offset}}
 
 While the StringFunctions #pos in MediaWiki 1.14 returned an
 empty string when the needle was not found, the ParserFunctions
 implementation of #pos in svn now returns -1.
 

I forgot to ask THE question. Is it a bug or is there some good reason 
to break backward compatibility?

And no, programming language cosmetics is not a good reason. :-)

If something has the same interface, it should have the same behaviour. 
If the old semantics was too awful to bare, the new one should have been
called #strpos or #fpos (for forward-#pos. #rpos always had the 
-1 return on no-found behaviour).


cheers
-henrik


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Google web bugs in Mediawiki js from admins - technical workarounds?

2009-06-04 Thread Michael Rosenthal
I suggest keep the bug on Wikimedia's servers and using a tool which
relies on SQL databases. These could be shared with the toolserver
where the official version of the analysis tool runs and users are
enabled to run their own queries (so taking a tool with a good
database structure would be nice). With that the toolserver users
could set up their own cool tools on that data.

On Thu, Jun 4, 2009 at 4:34 PM, David Gerard dger...@gmail.com wrote:
 2009/6/4 Daniel Kinzler dan...@brightbyte.de:
 David Gerard schrieb:

 Keeping well-meaning admins from putting Google web bugs in the
 JavaScript is a game of whack-a-mole.
 Are there any technical workarounds feasible? If not blocking the

 Perhaps the solution would be to simply set up our own JS based usage 
 tracker?
 There are a few options available
 http://en.wikipedia.org/wiki/List_of_web_analytics_software, and for 
 starters,
 the backend could run on the toolserver.
 Note that anything processing IP addresses will need special approval on the 
 TS.


 If putting that on the toolserver passes privacy policy muster, that'd
 be an excellent solution. Then external site loading can be blocked.

 (And if the toolservers won't melt in the process.)


 - d.

 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Google web bugs in Mediawiki js from admins - technical workarounds?

2009-06-04 Thread Gregory Maxwell
On Thu, Jun 4, 2009 at 10:19 AM, David Gerard dger...@gmail.com wrote:
 Keeping well-meaning admins from putting Google web bugs in the
 JavaScript is a game of whack-a-mole.

 Are there any technical workarounds feasible? If not blocking the
 loading of external sites entirely (I understand hu:wp uses a web bug
 that isn't Google), perhaps at least listing the sites somewhere
 centrally viewable?

Restrict site-wide JS and raw HTML injection to a smaller subset of
users who have been specifically schooled in these issues.


This approach is also compatible with other approaches. It has the
advantage of being simple to implement and should produce a
considerable reduction in problems regardless of the underlying cause.


Just be glad no one has yet turned english wikipedia's readers into
their own personal DDOS drone network.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Google web bugs in Mediawiki js from admins - technical workarounds?

2009-06-04 Thread Neil Harris
Michael Rosenthal wrote:
 I suggest keep the bug on Wikimedia's servers and using a tool which
 relies on SQL databases. These could be shared with the toolserver
 where the official version of the analysis tool runs and users are
 enabled to run their own queries (so taking a tool with a good
 database structure would be nice). With that the toolserver users
 could set up their own cool tools on that data.
   

If Javascript was used to serve the bug, it would be quite easy to only 
load the bug some small fraction of the time, allowing a fair 
statistical sample of JS-enabled readers (who should, I hope, be fairly 
representative of the whole population) to be taken without melting down 
the servers.

I suspect the fact that most bots and spiders do not interpret 
Javascript, and would thus be excluded from participating in the traffic 
survey, could be regarded as an added bonus.

-- Neil


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Google web bugs in Mediawiki js from admins - technical workarounds?

2009-06-04 Thread David Gerard
2009/6/4 Mike.lifeguard mikelifegu...@fastmail.fm:
 On Thu, 2009-06-04 at 15:34 +0100, David Gerard wrote:

 Then external site loading can be blocked.

 Why do we need to block loading from all external sites? If there are
 specific  problematic ones (like google analytics) then why not block
 those?


Because having the data go outside Wikimedia at all is a privacy
policy violation, as I understand it (please correct me if I'm wrong).


- d.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Google web bugs in Mediawiki js from admins - technical workarounds?

2009-06-04 Thread Gregory Maxwell
On Thu, Jun 4, 2009 at 10:53 AM, David Gerard dger...@gmail.com wrote:
 I understand the problem with stats before was that the stats server
 would melt under the load. Leon's old wikistats page sampled 1:1000.
 The current stats (on dammit.lt and served up nicely on
 http://stats.grok.se) are every hit, but I understand (Domas?) that it
 was quite a bit of work to get the firehose of data in such a form as
 not to melt the receiving server trying to process it.

 OK, then the problem becomes: how to set up something like
 stats.grok.se feasibly internally for all the other data gathered from
 a hit? (Modulo stuff that needs to be blanked per privacy policy.)

What exactly are people looking for that isn't available from
stats.grok.se that isn't a privacy concern?

I had assumed that people kept installing these bugs because they
wanted source network break downs per-article and other clear privacy
violations.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Google web bugs in Mediawiki js from admins - technical workarounds?

2009-06-04 Thread Gregory Maxwell
On Thu, Jun 4, 2009 at 11:01 AM, Mike.lifeguard
mikelifegu...@fastmail.fm wrote:
 On Thu, 2009-06-04 at 15:34 +0100, David Gerard wrote:

 Then external site loading can be blocked.


 Why do we need to block loading from all external sites? If there are
 specific  problematic ones (like google analytics) then why not block
 those?

Because:

(1) External loading results in an uncontrolled leak of private reader
and editor information to third parties, in contravention of the
privacy policy as well as basic ethical operating principles.

(1a) most external loading script usage will also defeat users choice
of SSL and leak more information about their browsing to their local
network. It may also bypass any wikipedia specific anonymization
proxies they are using to keep their reading habits private.

(2) External loading produces a runtime dependency on third party
sites. Some other site goes down and our users experience some kind of
loss of service.

(3) The availability of external loading makes Wikimedia a potential
source of very significant DDOS attacks, intentional or otherwise.

Thats not to say that there aren't reasons to use remote loading, but
the potential harms mean that it should probably be a default-deny
permit-by-exception process rather than the other way around.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Google web bugs in Mediawiki js from admins - technical workarounds?

2009-06-04 Thread Daniel Kinzler
David Gerard schrieb:
 2009/6/4 Gregory Maxwell gmaxw...@gmail.com:
 
 Restrict site-wide JS and raw HTML injection to a smaller subset of
 users who have been specifically schooled in these issues.
 
 
 Is it feasible to allow admins to use raw HTML as appropriate but not
 raw JS? Being able to fix MediaWiki: space messages with raw HTML is
 way too useful on the occasions where it's useful.
 

Possible yes, sensible no. Because if you can edit raw html, you can inject
javascript.

-- daniel

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] [Foundation-l] Wikipedia tracks user behaviour via third party companies

2009-06-04 Thread Neil Harris
David Gerard wrote:
 Web bugs for statistical data are a legitimate want but potentially a
 horrible privacy violation.

 So I asked on wikitech-l, and the obvious answer appears to be to do
 it internally. Something like http://stats.grok.se/ only more so.

 So - if you want web bug data in a way that fits the privacy policy,
 please pop over to the wikitech-l thread with technical suggestions
 and solutions :-)


 - d.
Yes, modifying the http://stats.grok.se/ systems looks like the way to go.

What do people actually want to see from the traffic data? Do they want 
referrers, anonymized user trails, or what?

-- Neil


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Google web bugs in Mediawiki js from admins - technical workarounds?

2009-06-04 Thread Daniel Kinzler
David Gerard schrieb:
 2009/6/4 Mike.lifeguard mikelifegu...@fastmail.fm:
 On Thu, 2009-06-04 at 15:34 +0100, David Gerard wrote:
 
 Then external site loading can be blocked.
 
 Why do we need to block loading from all external sites? If there are
 specific  problematic ones (like google analytics) then why not block
 those?
 
 
 Because having the data go outside Wikimedia at all is a privacy
 policy violation, as I understand it (please correct me if I'm wrong).

I agree with that, *especially* if it's for the purpose of aggregating data
about users.

-- daniel



___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Google web bugs in Mediawiki js from admins - technical workarounds?

2009-06-04 Thread Finne Boonen
On Thu, Jun 4, 2009 at 17:00, Gregory Maxwell gmaxw...@gmail.com wrote:
 On Thu, Jun 4, 2009 at 10:53 AM, David Gerard dger...@gmail.com wrote:
 I understand the problem with stats before was that the stats server
 would melt under the load. Leon's old wikistats page sampled 1:1000.
 The current stats (on dammit.lt and served up nicely on
 http://stats.grok.se) are every hit, but I understand (Domas?) that it
 was quite a bit of work to get the firehose of data in such a form as
 not to melt the receiving server trying to process it.

 OK, then the problem becomes: how to set up something like
 stats.grok.se feasibly internally for all the other data gathered from
 a hit? (Modulo stuff that needs to be blanked per privacy policy.)

 What exactly are people looking for that isn't available from
 stats.grok.se that isn't a privacy concern?

 I had assumed that people kept installing these bugs because they
 wanted source network break downs per-article and other clear privacy
 violations.

On top of views/page
I'd be interested in keywords used, entryexit points, path analysis
when people are editing (do they save/leave/try to find help/...)
#edit starts, #submitted edits that don't get saved.

henna

-- 
Maybe you knew early on that your track went from point A to B, but
unlike you I wasn't given a map at birth! Alyssa, Chasing Amy

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Google web bugs in Mediawiki js from admins - technical workarounds?

2009-06-04 Thread Neil Harris
Neil Harris wrote:
 Daniel Kinzler wrote:
   
 David Gerard schrieb:
   
 
 2009/6/4 Gregory Maxwell gmaxw...@gmail.com:

 
   
 Restrict site-wide JS and raw HTML injection to a smaller subset of
 users who have been specifically schooled in these issues.
   
 
 Is it feasible to allow admins to use raw HTML as appropriate but not
 raw JS? Being able to fix MediaWiki: space messages with raw HTML is
 way too useful on the occasions where it's useful.

 
   
 Possible yes, sensible no. Because if you can edit raw html, you can inject
 javascript.

 -- daniel

   
 
 Not if you sanitize the HTML after the fact: just cleaning out script 
 tags and elements from the HTML stream should do the job.

 After this has been done to the user-generated content, the desired 
 locked-down script code can then be inserted at the final stages of page 
 generation.

 -- Neil

   

Come to think of it, you could also allow the carefully vetted loading 
of scripts from a very limited whitelist of Wikimedia-hosted and 
controlled domains and paths, when performing that sanitization.

Inline scripts remain a bad idea: there are just too many ways to 
obfuscate them and/or inject data into them to have any practical 
prospect of limiting them to safe features without heroic efforts.

However; writing a javascript sanitizer that restricted the user to a 
safe subset of the language, by first parsing and then resynthesizing 
the code using formal methods for validation, in a way similar to the 
current solution for TeX, would be an interesting project!

-- Neil


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Google web bugs in Mediawiki js from admins - technical workarounds?

2009-06-04 Thread Andrew Garrett

On 04/06/2009, at 4:08 PM, Daniel Kinzler wrote:

 David Gerard schrieb:
 2009/6/4 Gregory Maxwell gmaxw...@gmail.com:

 Restrict site-wide JS and raw HTML injection to a smaller subset of
 users who have been specifically schooled in these issues.


 Is it feasible to allow admins to use raw HTML as appropriate but not
 raw JS? Being able to fix MediaWiki: space messages with raw HTML is
 way too useful on the occasions where it's useful.


 Possible yes, sensible no. Because if you can edit raw html, you can  
 inject
 javascript.


When did we start treating our administrators as potentially malicious  
attackers? Any administrator could, in theory, add a cookie-stealing  
script to my user JS, steal my account, and grant themselves any  
rights they please.

We trust our administrators. If we don't, we should move the  
editinterface right further up the chain.

--
Andrew Garrett
Contract Developer, Wikimedia Foundation
agarr...@wikimedia.org
http://werdn.us




___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Google web bugs in Mediawiki js from admins - technical workarounds?

2009-06-04 Thread Mike.lifeguard
Thanks, that clarifies matters for me. I wasn't aware of #1, though I
guess upon reflection that makes sense.

-Mike

On Thu, 2009-06-04 at 11:07 -0400, Gregory Maxwell wrote:

 On Thu, Jun 4, 2009 at 11:01 AM, Mike.lifeguard
 mikelifegu...@fastmail.fm wrote:
  On Thu, 2009-06-04 at 15:34 +0100, David Gerard wrote:
 
  Then external site loading can be blocked.
 
 
  Why do we need to block loading from all external sites? If there are
  specific  problematic ones (like google analytics) then why not block
  those?
 
 Because:
 
 (1) External loading results in an uncontrolled leak of private reader
 and editor information to third parties, in contravention of the
 privacy policy as well as basic ethical operating principles.
 
 (1a) most external loading script usage will also defeat users choice
 of SSL and leak more information about their browsing to their local
 network. It may also bypass any wikipedia specific anonymization
 proxies they are using to keep their reading habits private.
 
 (2) External loading produces a runtime dependency on third party
 sites. Some other site goes down and our users experience some kind of
 loss of service.
 
 (3) The availability of external loading makes Wikimedia a potential
 source of very significant DDOS attacks, intentional or otherwise.
 
 Thats not to say that there aren't reasons to use remote loading, but
 the potential harms mean that it should probably be a default-deny
 permit-by-exception process rather than the other way around.
 
 
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] StringFunctions/ParserFunctions #pos return value changed

2009-06-04 Thread Robert Rohde
On Thu, Jun 4, 2009 at 6:55 AM, H. Langos henrik...@prak.org wrote:
 Seems like (at least) the API of #pos in ParserFunctions is
 different from the one in StringFunctions.

 {{#pos: haysack|needle|offset}}

 While the StringFunctions #pos in MediaWiki 1.14 returned an
 empty string when the needle was not found, the ParserFunctions
 implementation of #pos in svn now returns -1.
snip

Prior to the merge 100% of the StringFunction function calls were
reimplemented, principally for performance and security reasons.

The short but uninspired answer to your question is that in doing that
I didn't notice that #pos and #rpos had different default behavior.
Given the way that #if works, returning empty string is a reasonable
response to a string-not-found condition, and I am happy to change
that back.  I'll also recheck to make sure there aren't any other
unexpected behavioral changes.

Though they don't have to have the same behavior, I'd be inclined to
argue that #pos and #rpos really ought to have the same default
behavior on usability grounds, i.e. either both giving -1 or both
giving empty string when a match is not found.  Though since that does
create compatibility issues with existing StringFunctions users, I'll
defer to others about whether consistency would be a good enough
motivation in this case.


I should warn you though that there is an intentional behavioral
change regarding the handling of strip markers.  The pre-existing
StringFunctions codebase reacted to strip markers in a way that was
inefficient, hard for the end user to predict, and in specially
crafted cases created security issues.

The following example is illustrative of the change.

Consider the string ABCnowikijkl/nowikiDEFnowikimno/nowikiGHI

In the new implementation this is treated internally as ABCDEFGHI by
the string routines.  Hence it's length is 9 and it's first five
characters are ABCDE.

For complicated reasons the StringFunctions version says its length is
7 and the first five characters are ABCjklDEFmnoG.

-Robert Rohde

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] StringFunctions/ParserFunctions #pos return value changed

2009-06-04 Thread H. Langos
On Thu, Jun 04, 2009 at 05:05:50PM +0100, Andrew Garrett wrote:
 
 On 04/06/2009, at 3:46 PM, H. Langos wrote:
 
  On Thu, Jun 04, 2009 at 03:55:38PM +0200, H. Langos wrote:
  Seems like (at least) the API of #pos in ParserFunctions is
  different from the one in StringFunctions.
 
  {{#pos: haysack|needle|offset}}
 
  While the StringFunctions #pos in MediaWiki 1.14 returned an
  empty string when the needle was not found, the ParserFunctions
  implementation of #pos in svn now returns -1.
 
 
  I forgot to ask THE question. Is it a bug or is there some good reason
  to break backward compatibility?
 
  And no, programming language cosmetics is not a good reason. :-)
 
  If something has the same interface, it should have the same  
  behaviour.
  If the old semantics was too awful to bare, the new one should have  
  been
  called #strpos or #fpos (for forward-#pos. #rpos always had the
  -1 return on no-found behaviour).
 
 This should be left as a comment on the relevant revision in  
 CodeReview. Note that it's likely irrelevant anyway, as, in all  
 likelihood, the merge of String and Parser Functions will be reverted.

Sorry to bother you but I am not a wikimedia developer so I wouldn't know
where to start looking.

Could you point me to the right place/list/article? The svn revision with 
the String and Parser Functions merge was 50997.

cheers
-henrik


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] StringFunctions/ParserFunctions #pos return value changed

2009-06-04 Thread Robert Rohde
On Thu, Jun 4, 2009 at 9:05 AM, Andrew Garrett agarr...@wikimedia.org wrote:

 On 04/06/2009, at 3:46 PM, H. Langos wrote:

 On Thu, Jun 04, 2009 at 03:55:38PM +0200, H. Langos wrote:
 Seems like (at least) the API of #pos in ParserFunctions is
 different from the one in StringFunctions.

 {{#pos: haysack|needle|offset}}

 While the StringFunctions #pos in MediaWiki 1.14 returned an
 empty string when the needle was not found, the ParserFunctions
 implementation of #pos in svn now returns -1.


 I forgot to ask THE question. Is it a bug or is there some good reason
 to break backward compatibility?

 And no, programming language cosmetics is not a good reason. :-)

 If something has the same interface, it should have the same
 behaviour.
 If the old semantics was too awful to bare, the new one should have
 been
 called #strpos or #fpos (for forward-#pos. #rpos always had the
 -1 return on no-found behaviour).

 This should be left as a comment on the relevant revision in
 CodeReview. Note that it's likely irrelevant anyway, as, in all
 likelihood, the merge of String and Parser Functions will be reverted.

Two devs, who shall remain nameless unless they choose to take credit
for it, explicitly encouraged the merge.  Personally, I've always
thought it made more sense to keep these as separate extensions but I
went along with what they encouraged me to do.

Regardless of whether it is one extension or two, I do strongly feel
that once a technically acceptable implementation of string functions
exists then it should be enabled on WMF sites.  (I agree though that
the previous StringFunctions was rightly excluded due to
implementation problems.)

-Robert Rohde

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Google web bugs in Mediawiki js from admins - technical workarounds?

2009-06-04 Thread Mike.lifeguard
On Thu, 2009-06-04 at 17:04 +0100, Andrew Garrett wrote:

 When did we start treating our administrators as potentially malicious  
 attackers? Any administrator could, in theory, add a cookie-stealing  
 script to my user JS, steal my account, and grant themselves any  
 rights they please.
 
 We trust our administrators. If we don't, we should move the  
 editinterface right further up the chain.


They are potentially malicious attackers, but we nevertheless trust them
not to do bad things. We in this case refers only to most of
Wikimedia, I guess, since there has been no shortage of paranoia both on
bugzilla and this list recently - a sad state of affairs to be sure.

-Mike
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Google web bugs in Mediawiki js from admins - technical workarounds?

2009-06-04 Thread Brian
How does installing 3rd party analytics software help the WMF accomplish its
goals?

On Thu, Jun 4, 2009 at 8:31 AM, Daniel Kinzler dan...@brightbyte.de wrote:

 David Gerard schrieb:
  Keeping well-meaning admins from putting Google web bugs in the
  JavaScript is a game of whack-a-mole.
 
  Are there any technical workarounds feasible? If not blocking the
  loading of external sites entirely (I understand hu:wp uses a web bug
  that isn't Google), perhaps at least listing the sites somewhere
  centrally viewable?

 Perhaps the solution would be to simply set up our own JS based usage
 tracker?
 There are a few options available
 http://en.wikipedia.org/wiki/List_of_web_analytics_software, and for
 starters,
 the backend could run on the toolserver.

 Note that anything processing IP addresses will need special approval on
 the TS.

 -- daniel

 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] StringFunctions/ParserFunctions #pos return value changed

2009-06-04 Thread Aryeh Gregor
On Thu, Jun 4, 2009 at 12:05 PM, Andrew Garrettagarr...@wikimedia.org wrote:
 Note that it's likely irrelevant anyway, as, in all
 likelihood, the merge of String and Parser Functions will be reverted.

Have Tim or Brion said this?
https://bugzilla.wikimedia.org/show_bug.cgi?id=6455#c36 is the only
clear statement I've seen by either of them that I can recall.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Google web bugs in Mediawiki js from admins - technical workarounds?

2009-06-04 Thread David Gerard
2009/6/4 Finne Boonen hen...@gmail.com:
 On Thu, Jun 4, 2009 at 17:00, Gregory Maxwell gmaxw...@gmail.com wrote:

 What exactly are people looking for that isn't available from
 stats.grok.se that isn't a privacy concern?
 I had assumed that people kept installing these bugs because they
 wanted source network break downs per-article and other clear privacy
 violations.

 On top of views/page
 I'd be interested in keywords used, entryexit points, path analysis
 when people are editing (do they save/leave/try to find help/...)
 #edit starts, #submitted edits that don't get saved.


Path analysis is a big one. All that other stuff, if it won't violate
privacy, would be fantastically useful to researchers, internal and
external, in ways we won't have even thought of yet, and help us
considerably to improve the projects.

(This would have to be given considerable thought from a
security/hacker mindset - e.g. even with IPs stripped, listing user
pages and user page edits would likely give away an identity. Talk
pages may do the same. Those are just off the top of my head, I'm sure
someone has already made a list of what they could work out even with
IPs anonymised or even stripped.)


- d.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Google web bugs in Mediawiki js from admins - technical workarounds?

2009-06-04 Thread David Gerard
2009/6/4 Andrew Garrett agarr...@wikimedia.org:

 When did we start treating our administrators as potentially malicious
 attackers? Any administrator could, in theory, add a cookie-stealing
 script to my user JS, steal my account, and grant themselves any
 rights they please.


That's why I started this thread talking about things being done right
now by well-meaning admins :-)


- d.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Google web bugs in Mediawiki js from admins - technical workarounds?

2009-06-04 Thread Aryeh Gregor
On Thu, Jun 4, 2009 at 11:56 AM, Neil Harrisuse...@tonal.clara.co.uk wrote:
 However; writing a javascript sanitizer that restricted the user to a
 safe subset of the language, by first parsing and then resynthesizing
 the code using formal methods for validation, in a way similar to the
 current solution for TeX, would be an interesting project!

Interesting, but probably not very useful.  If we restricted
JavaScript the way we restricted TeX, we'd have to ban function
definitions, loops, conditionals, and most function calls.  I suspect
you'd have to make it pretty much unusable to make output of specific
strings impossible.

On Thu, Jun 4, 2009 at 12:45 PM, Gregory Maxwellgmaxw...@gmail.com wrote:
 Regarding HTML sanitation: Raw HTML alone without JS is enough to
 violate users privacy: Just add a hidden image tag to a remote site.
 Yes you could sanitize out various bad things, but then thats not raw
 HTML anymore, is it?

It might be good enough for the purposes at hand, though.  What are
the use-cases for wanting raw HTML in messages, instead of wikitext or
plaintext?

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Google web bugs in Mediawiki js from admins - technical workarounds?

2009-06-04 Thread David Gerard
2009/6/4 Brian brian.min...@colorado.edu:

 How does installing 3rd party analytics software help the WMF accomplish its
 goals?


Detailed analysis of how users actually use the site would be vastly
useful in improving the sites' content and usability.


- d.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Google web bugs in Mediawiki js from admins - technical workarounds?

2009-06-04 Thread Chad
On Thu, Jun 4, 2009 at 2:32 PM, David Gerard dger...@gmail.com wrote:
 2009/6/4 Gregory Maxwell gmaxw...@gmail.com:

 I think the biggest problem to reducing accesses is that far more
 mediawiki messages are uncooked than is needed. Were it not for this I
 expect this access would have been curtailed somewhat a long time ago.


 I think you've hit the actual problem there. Someone with too much
 time on their hands who could go through all of the MediaWiki: space
 to see what really needs to be HTML rather than wikitext?


 - d.

 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l


See bug 212[1], which is (sort of) a tracker for the wikitext-ification
of the messages.

-Chad

[1] https://bugzilla.wikimedia.org/show_bug.cgi?id=212

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] StringFunctions/ParserFunctions #pos return value changed

2009-06-04 Thread Aryeh Gregor
On Thu, Jun 4, 2009 at 2:29 PM, Brianbrian.min...@colorado.edu wrote:
 I was privy to a #mediawiki conversation between brion/tim where tim pointed
 out that at least one person plans to implement a Natural Language
 Processing parser for English using StringFunctions just as soon as they are
 enabled.

 It's pretty obvious that you can implement all sorts crazy algorithms using
 StringFunctions. They need to be limited so that is not possible.

Note, though, that there are some that are already possible to some
extent.  You can use the core padright/padleft functions to emulate a
couple of the added functions.  E.g.:

http://en.wikipedia.org/w/index.php?title=Template:Str_lenaction=edit

The most template-heavy pages already tend to run close to the
template limits, until they're cut down by users when they fail.  It's
not clear to me that allowing more functions would actually increase
overall load or template complexity significantly.  It might decrease
it by allowing simpler and more efficient implementations of things
that currently need to be worked around.  It can't really increase it
too much, theoretically -- that's what the template limits are for.

Werdna points out that Tim did say this morning in #mediawiki that
he'd probably revert the change.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] StringFunctions/ParserFunctions #pos return value changed

2009-06-04 Thread Robert Rohde
On Thu, Jun 4, 2009 at 11:29 AM, Brian brian.min...@colorado.edu wrote:
 I was privy to a #mediawiki conversation between brion/tim where tim pointed
 out that at least one person plans to implement a Natural Language
 Processing parser for English using StringFunctions just as soon as they are
 enabled.

 It's pretty obvious that you can implement all sorts crazy algorithms using
 StringFunctions. They need to be limited so that is not possible.

If you are referring to the conversation I think you are, then my
impression was Tim was speaking hypothetically about the issue rather
than knowing someone that had this specific intent.

I'm fairly dubious about anyone actually trying natural language
processing to any serious degree.  Real natural language processing
needs huge lookup tables to identify part of speech and relationships
etc.  Technically possible I suppose, but not easy to do.

I'm even more dubious that full fledged natural language processing --
in templates -- would find significant uses.  It is more efficient and
more practical to view templates as simple formatting macros rather
than as a system for real natural language interaction.  There are
very useful things that can be done with simple string algorithms,
such as detecting the (bar) when given a title like Foo (bar), but
I wouldn't expect anyone to be answering queries with them or anything
like that.

When providing tools to content creators, flexibility is generally a
positive design feature.  We shouldn't go overboard with imposing
limits in the advance of actual problems.

The current implementation is artificially limited to 1000 characters
or less, which does prevent huge manipulations, however.

-Robert Rohde

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] StringFunctions/ParserFunctions #pos return value changed

2009-06-04 Thread Robert Rohde
On Thu, Jun 4, 2009 at 11:52 AM, Aryeh Gregor
simetrical+wikil...@gmail.com wrote:
 Note, though, that there are some that are already possible to some
 extent.  You can use the core padright/padleft functions to emulate a
 couple of the added functions.  E.g.:

 http://en.wikipedia.org/w/index.php?title=Template:Str_lenaction=edit
snip

I would like to note for the record that Brion explicitly endorsed
the padleft hack to the degree that he re-enabled it after Werdna had
removed it. [1]

Maybe he'd change his mind after looking at how the string
manipulation templates are actually getting used (now in 20,000
enwiki pages and counting), but for the moment he seems to have
supported allowing some form of hacked together string manipulation
system into Mediawiki.  To that end it makes more sense to have a real
string implementation rather than the ridiculous templates we have
now.

-Robert Rohde

[1] http://svn.wikimedia.org/viewvc/mediawiki?view=revrevision=47411

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


[Wikitech-l] Internal links and diacritics

2009-06-04 Thread Strainu
Hi,

I'm trying to format a link like this: [[musulman]]ă. On ro.wp, this
is equivalent to [[musulman|musulman]]ă (the special letter is not
included in the wiki link. While going through
http://www.mediawiki.org/wiki/Markup_spec I saw that:

internal-link ::= internal-link-start article-link  [ #
section-id ] [ pipe [link-description] ] internal-link-end
[extra-description]
extra-description ::= letter [extra-description]
letter::= ucase-letter | lcase-letter
ucase-letter  ::= A | B | ... | Y | Z
lcase-letter  ::= a | b | ... | y | z


This tells me that only ASCII letters are used for this type of
linking. However, on fr.wp I can write [[Ren]]é and this is equivalent
to [[Ren|René]].

How was this made? Is it something that can be set by from a page or
should some php be changed?

Thanks,
   Strainu

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Internal links and diacritics

2009-06-04 Thread Tar Dániel
You have to use the MediaWiki:Linktrail page, for example:
http://hu.wikipedia.org/wiki/MediaWiki:Linktrail (or see the same page on
fr.wiki).

D.
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Internal links and diacritics

2009-06-04 Thread Ahmad Sherif

 You have to use the MediaWiki:Linktrail page, for example:
 http://hu.wikipedia.org/wiki/MediaWiki:Linktrail (or see the same page on
 fr.wiki).


AFAIK, it has to be set in the language file thru $linkTrail variable,
because it looks like that MediaWiki:Linktrail is no longer used.

On Thu, Jun 4, 2009 at 10:38 PM, Tar Dániel bdane...@gmail.com wrote:

 You have to use the MediaWiki:Linktrail page, for example:
 http://hu.wikipedia.org/wiki/MediaWiki:Linktrail (or see the same page on
 fr.wiki).

 D.
 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Internal links and diacritics

2009-06-04 Thread Aryeh Gregor
2009/6/4 Strainu strain...@gmail.com:
 While going through
 http://www.mediawiki.org/wiki/Markup_spec I saw that:

 internal-link         ::= internal-link-start article-link  [ #
 section-id ] [ pipe [link-description] ] internal-link-end
 [extra-description]
 extra-description     ::= letter [extra-description]
 letter                ::= ucase-letter | lcase-letter
 ucase-letter          ::= A | B | ... | Y | Z
 lcase-letter          ::= a | b | ... | y | z


 This tells me that only ASCII letters are used for this type of
 linking.

It's wrong.  Don't trust that page too much.  It was written after the
fact to try to document the parser, not something the parser was
designed to follow.  It's almost certainly wrong in a lot of corner
cases.  (Like non-English languages, apparently.)

On Thu, Jun 4, 2009 at 3:53 PM, Ahmad Sherifahmad.m.she...@gmail.com wrote:
 AFAIK, it has to be set in the language file thru $linkTrail variable,
 because it looks like that MediaWiki:Linktrail is no longer used.

Correct.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Internal links and diacritics

2009-06-04 Thread Roan Kattouw
2009/6/4 Strainu strain...@gmail.com:
 Hi,

 I'm trying to format a link like this: [[musulman]]ă. On ro.wp, this
 is equivalent to [[musulman|musulman]]ă (the special letter is not
 included in the wiki link. While going through
 http://www.mediawiki.org/wiki/Markup_spec I saw that:

 internal-link         ::= internal-link-start article-link  [ #
 section-id ] [ pipe [link-description] ] internal-link-end
 [extra-description]
 extra-description     ::= letter [extra-description]
 letter                ::= ucase-letter | lcase-letter
 ucase-letter          ::= A | B | ... | Y | Z
 lcase-letter          ::= a | b | ... | y | z


 This tells me that only ASCII letters are used for this type of
 linking. However, on fr.wp I can write [[Ren]]é and this is equivalent
 to [[Ren|René]].

 How was this made? Is it something that can be set by from a page or
 should some php be changed?

The set of characters allowed in the so-called linktrail depends on
the language used, and is set in the individual LanguageXx.php files.

Roan Kattouw (Catrope)

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Internal links and diacritics

2009-06-04 Thread Strainu
On Thu, Jun 4, 2009 at 10:53 PM, Ahmad Sherif ahmad.m.she...@gmail.com wrote:

 You have to use the MediaWiki:Linktrail page, for example:
 http://hu.wikipedia.org/wiki/MediaWiki:Linktrail (or see the same page on
 fr.wiki).


 AFAIK, it has to be set in the language file thru $linkTrail variable,
 because it looks like that MediaWiki:Linktrail is no longer used.

 On Thu, Jun 4, 2009 at 10:38 PM, Tar Dániel bdane...@gmail.com wrote:

 You have to use the MediaWiki:Linktrail page, for example:
 http://hu.wikipedia.org/wiki/MediaWiki:Linktrail (or see the same page on
 fr.wiki).

 D.
 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Yep, I started from there and got to
http://meta.wikimedia.org/wiki/MediaWiki_talk:Linktrail It suddenly
became all clear :)

Thank you all for your responses.

Strainu

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] StringFunctions/ParserFunctions #pos return value changed

2009-06-04 Thread Andrew Garrett

On 04/06/2009, at 8:03 PM, Robert Rohde wrote:
 I would like to note for the record that Brion explicitly endorsed
 the padleft hack to the degree that he re-enabled it after Werdna had
 removed it. [1]

 Maybe he'd change his mind after looking at how the string
 manipulation templates are actually getting used (now in 20,000
 enwiki pages and counting), but for the moment he seems to have
 supported allowing some form of hacked together string manipulation
 system into Mediawiki.  To that end it makes more sense to have a real
 string implementation rather than the ridiculous templates we have
 now.

I wouldn't read that into it. I think it's better characterised as  
reverting attempts to create an arms race over the hacks.

--
Andrew Garrett
Contract Developer, Wikimedia Foundation
agarr...@wikimedia.org
http://werdn.us




___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Google web bugs in Mediawiki js from admins - technical workarounds?

2009-06-04 Thread Brian
That's why WMF now has a usability lab.

On Thu, Jun 4, 2009 at 12:34 PM, David Gerard dger...@gmail.com wrote:

 2009/6/4 Brian brian.min...@colorado.edu:

  How does installing 3rd party analytics software help the WMF accomplish
 its
  goals?


 Detailed analysis of how users actually use the site would be vastly
 useful in improving the sites' content and usability.


 - d.

 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Google web bugs in Mediawiki js from admins - technical workarounds?

2009-06-04 Thread David Gerard
2009/6/4 Brian brian.min...@colorado.edu:

 That's why WMF now has a usability lab.


Yep. They'd dive on this stuff with great glee if we can implement it
without breaking privacy or melting servers.


- d.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


[Wikitech-l] firefogg local encode new-upload branch update.

2009-06-04 Thread Michael Dale
As you may know I have been working on firefogg integration with 
mediaWiki. As you may also know the mwEmbed library is being designed to 
support embedding of these interfaces in arbitrary external contexts.  I 
wanted to quickly highlight a useful stand alone usage example of the 
library:

http://www.firefogg.org/make/advanced.html

This Make Ogg link will be something you can send to a person so they 
can encode source footage to a local ogg video file with the latest and 
greatest ogg encoders (presently the thusnelda theora encoder   vorbis 
audio). Updates to thusnelda and other free codecs will be pushed out 
via firefogg updates.

For commons / wikimedia usage we will directly integrate firefogg (using 
that same codebase) You can see an example of how that works on the 
'new-upload' branch here: 
http://sandbox.kaltura.com/testwiki/index.php/Special:Upload ... 
hopefully we will start putting some of this on testing.wikipedia.org 
~soonish ?~

The new-upload branch feature set is quite extensive including the 
script-loader, jquery javascript refactoring, the new upload-api, new 
mv_embed video player, add media wizard etc. Any feedback and specific 
bug reports people can do will be super helpful in gearing up for 
merging this 'new-upload' branch.

For an overview see:
http://www.mediawiki.org/wiki/Media_Projects_Overview

peace,
--michael

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l