Re: [Wikitech-l] w...@home Extension

2009-08-02 Thread Marco Schuster
On Sun, Aug 2, 2009 at 2:32 AM, Platonides platoni...@gmail.com wrote:

  I'd actually be interested how YouTube and the other video hosters
 protect
  themselves against hacker threats - did they code totally new
 de/en-coders?

 That would be even more risky than using existing, tested (de|en)coders.

Really? If they simply don't publish the source (and the binaries), then the
only possible way for an attacker is fuzzing... and that can take long time.

Marco

--
VMSoft GbR
Nabburger Str. 15
81737 München
Geschäftsführer: Marco Schuster, Volker Hemmert
http://vmsoft-gbr.de
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] w...@home Extension

2009-08-02 Thread David Gerard
2009/8/2 Marco Schuster ma...@harddisk.is-a-geek.org:

 Really? If they simply don't publish the source (and the binaries), then the
 only possible way for an attacker is fuzzing... and that can take long time.


I believe they use ffmpeg, like everyone does. The ffmpeg code has had
people kicking it for quite a while. Transcoding as a given Unix user
with not many powers is reasonable isolation.


- d.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Wiki at Home Extension

2009-08-02 Thread Steve Bennett
On Sun, Aug 2, 2009 at 12:16 AM, Tisza Gergőgti...@gmail.com wrote:
 Gregory Maxwell gmaxwell at gmail.com writes:

 I don't know how to figure out how much it would 'cost' to have human
 contributors spot embedded penises snuck into transcodes and then
 figure out which of several contributing transcoders are doing it and
 blocking them, only to have the bad user switch IPs and begin again.
 ... but it seems impossibly expensive even though it's not an actual
 dollar cost.

 Standard solution to that is to perform each operation multiple times on
 different machines and then compare results. Of course, that raises bandwidth
 costs even further.

Why are we suddenly concerned about someone sneaking obscenity onto a
wiki? As if no one has ever snuck a rude picture onto a main page...

Steve

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Wiki at Home Extension

2009-08-02 Thread Gerard Meijssen
Hoi,
Because it is no longer obvious that vandalism has taken place. You have to
look at the changes.. the whole time to find what might be an issue
Thanks,
 GerardM

2009/8/2 Steve Bennett stevag...@gmail.com

 On Sun, Aug 2, 2009 at 12:16 AM, Tisza Gergőgti...@gmail.com wrote:
  Gregory Maxwell gmaxwell at gmail.com writes:
 
  I don't know how to figure out how much it would 'cost' to have human
  contributors spot embedded penises snuck into transcodes and then
  figure out which of several contributing transcoders are doing it and
  blocking them, only to have the bad user switch IPs and begin again.
  ... but it seems impossibly expensive even though it's not an actual
  dollar cost.
 
  Standard solution to that is to perform each operation multiple times on
  different machines and then compare results. Of course, that raises
 bandwidth
  costs even further.

 Why are we suddenly concerned about someone sneaking obscenity onto a
 wiki? As if no one has ever snuck a rude picture onto a main page...

 Steve

 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] GIF thumbnailing

2009-08-02 Thread Ilmari Karonen
Currently, server side GIF thumbnailing on Wikimedia sites is disabled 
entirely by setting $wgMediaHandlers['image/gif'] = 
'BitmapHandler_ClientOnly';

This causes all GIF files to be send to the browser at original size 
regardless of what size has been requested.

While most folks seem to have pretty much resigned to the fact that 
animated GIFs can't be thumbnailed -- it never worked very well to begin 
with -- the fact that even static GIFs are sent at full resolution 
remains somewhat annoying, especially since people have uploaded some 
extremely large bitmaps in GIF format in the past.

It seems to me that delivering *static* thumbnails of GIF images, either 
in GIF or PNG format, would be a considerable improvement over the 
current situation.  And indeed, the code to do that seems to be already 
in place: just set $wgMaxAnimatedGifArea = 0;

So, my questions would be:

1. Is there a reason we don't do this already?

2. If yes, and the reason is the GIF encoding causes too much load, 
would thumbnailing GIFs to PNG be better?  It should only take a few 
lines of code to change the output format.

3. Alternatively, if the problem is ImageMagick taking too much time to 
read animated GIFs even just to extract only the first frame, would some 
other scaling program be better?  Indeed, it should even be possible to 
write a bit of PHP code to pull out just the first frame of a GIF file 
and hand it off to ImageMagick for scaling.

Ps. I'd also like to take the opportunity to remind everyone of the 
existence of https://bugzilla.wikimedia.org/show_bug.cgi?id=16451 and of 
the excellent-looking patch Werdna has written for it.  We can do a 
_lot_ better with GIfs that we're currently doing.

-- 
Ilmari Karonen

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] GIF thumbnailing

2009-08-02 Thread David Gerard
2009/8/2 Ilmari Karonen nos...@vyznev.net:

 It seems to me that delivering *static* thumbnails of GIF images, either
 in GIF or PNG format, would be a considerable improvement over the
 current situation.  And indeed, the code to do that seems to be already
 in place: just set $wgMaxAnimatedGifArea = 0;
 So, my questions would be:
 1. Is there a reason we don't do this already?


Careful to only do it if in fact definitely scaling. Animated GIFs are
in use on lots of page on en:wp, for instance, when there's reason to
put a small animation right there on the page rather than a click
away.


 3. Alternatively, if the problem is ImageMagick taking too much time to
 read animated GIFs even just to extract only the first frame, would some
 other scaling program be better?  Indeed, it should even be possible to
 write a bit of PHP code to pull out just the first frame of a GIF file
 and hand it off to ImageMagick for scaling.


ImageMagick is a Swiss army knife - it does everything and isn't the
best at anything. I would be unsurprised if there was a tool to do
this specific job much better.


- d.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] GIF thumbnailing

2009-08-02 Thread Brion Vibber
On 8/2/09 7:26 AM, Ilmari Karonen wrote:
 It seems to me that delivering *static* thumbnails of GIF images, either
 in GIF or PNG format, would be a considerable improvement over the
 current situation.  And indeed, the code to do that seems to be already
 in place: just set $wgMaxAnimatedGifArea = 0;

 So, my questions would be:

 1. Is there a reason we don't do this already?

IIRC, we don't yet have working detection for animated GIFs:
https://bugzilla.wikimedia.org/show_bug.cgi?id=16451

Looks like Andrew put together an updated patch a couple months ago but 
didn't have a chance to test and confirm it was working properly. Anyone 
care to take a peek?

-- brion

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] GIF thumbnailing

2009-08-02 Thread Andrew Garrett

On 02/08/2009, at 5:36 PM, Brion Vibber wrote:
 Looks like Andrew put together an updated patch a couple months ago  
 but
 didn't have a chance to test and confirm it was working properly.  
 Anyone
 care to take a peek?

I can possibly poke this tomorrow, must have slipped through my  
fingers on Bug Friday :)

--
Andrew Garrett
agarr...@wikimedia.org
http://werdn.us/


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] GIF thumbnailing

2009-08-02 Thread Gregory Maxwell
On Sun, Aug 2, 2009 at 10:26 AM, Ilmari Karonennos...@vyznev.net wrote:
[snip]
 It seems to me that delivering *static* thumbnails of GIF images, either
 in GIF or PNG format, would be a considerable improvement over the
 current situation.  And indeed, the code to do that seems to be already
 in place: just set $wgMaxAnimatedGifArea = 0;

So— separate from animation why would you use an gif rather than a
PNG?  I can think of two reasons:

(1) you're making a spacer image and the gif is actually smaller,
scaling isn't relevant here
(2) you're using gif transparency and are obsessed with compatibility
with old IE. Scaling doesn't tend to work really well with binary
transparency.


In other cases the gif tends to be larger, loads slower, etc.  They
can be converted to PNG losslessly, so you should probably do so.
What am I missing?

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] GIF thumbnailing

2009-08-02 Thread Aryeh Gregor
On Sun, Aug 2, 2009 at 1:33 PM, Gregory Maxwellgmaxw...@gmail.com wrote:
 In other cases the gif tends to be larger, loads slower, etc.  They
 can be converted to PNG losslessly, so you should probably do so.
 What am I missing?

That a lot of people don't know the above and upload GIFs anyway, and
until we can fix all users of each of these images, it would be nice
to not serve the full-size file all the time.  (Anyone want to make
the planned transcode-on-upload infrastructure work for images too?
:) )

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] GIF thumbnailing

2009-08-02 Thread Ilmari Karonen
Brion Vibber wrote:
 On 8/2/09 7:26 AM, Ilmari Karonen wrote:
 It seems to me that delivering *static* thumbnails of GIF images, either
 in GIF or PNG format, would be a considerable improvement over the
 current situation.  And indeed, the code to do that seems to be already
 in place: just set $wgMaxAnimatedGifArea = 0;
 
 IIRC, we don't yet have working detection for animated GIFs:
 https://bugzilla.wikimedia.org/show_bug.cgi?id=16451

That shouldn't matter, should it?  Setting $wgMaxAnimatedGifArea to N 
simply causes Bitmap.php to tell ImageMagick to only extract the first 
frame of any GIF files with N or more pixels (which makes no difference 
if the GIF in fact only has one frame).

-- 
Ilmari Karonen

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] GIF thumbnailing

2009-08-02 Thread Ilmari Karonen
Aryeh Gregor wrote:
 On Sun, Aug 2, 2009 at 1:33 PM, Gregory Maxwellgmaxw...@gmail.com wrote:
 In other cases the gif tends to be larger, loads slower, etc.  They
 can be converted to PNG losslessly, so you should probably do so.
 What am I missing?
 
 That a lot of people don't know the above and upload GIFs anyway, and
 until we can fix all users of each of these images, it would be nice
 to not serve the full-size file all the time.  (Anyone want to make
 the planned transcode-on-upload infrastructure work for images too?
 :) )

Also, Commons alone has almost a hundred thousand existing GIF files. 
Converting them all to PNG would be a significant job, especially since 
changing the format (and thus the suffix) means that they can't just be 
reuploaded under the same title.

(Which means you have to copy the description page and the old upload 
history, update links on all other projects, give any non-Wikimedia 
projects using ForeignAPIRepo time to update their links and hope that 
they actually do so, and even after you've done all that, it *still* 
ends up messing with watchlists, user contribution histories and old 
article versions because there's no way to update those.  Eugh.)

-- 
Ilmari Karonen

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] GIF thumbnailing

2009-08-02 Thread Ilmari Karonen
Gregory Maxwell wrote:
 
 (2) you're using gif transparency and are obsessed with compatibility
 with old IE. Scaling doesn't tend to work really well with binary
 transparency.

By the way, forgot to mention this earlier, but this is actually a very 
good argument in favor of outputting thumbnails of GIFs in PNG format.

-- 
Ilmari Karonen

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Wiki at Home Extension

2009-08-02 Thread Tisza Gergő
Steve Bennett stevagewp at gmail.com writes:

 Why are we suddenly concerned about someone sneaking obscenity onto a
 wiki? As if no one has ever snuck a rude picture onto a main page...

There is a slight difference between vandalism that shows up in recent changes
and one that leaves no trail at all except maybe in log files only accessible
for sysadmins.


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Wiki at Home Extension

2009-08-02 Thread Michael Dale
two quick points.
1) you don't have to re-upload the whole video just the sha1 or some 
sort of hash of the assigned chunk.
2) should be relatively strait froward to catch abuse via assigned user 
id's to each chunk uploaded. But checking the sha1 a few times from 
other random clients that are encoding other pieces would make abuse 
very difficult... at the cost of a few small http requests after the 
encode is done, and at a cost of slightly more CPU cylces of the 
computing pool. But as this thread has pointed out CPU cycles are much 
cheaper than bandwidth bits or humans time patrolling derivatives.

We have the advantage with a system like Firefogg that we control the 
version of the encoder pushed out to clients via auto-update and check 
that before accepting their participation (so sha1s should match if the 
client is not doing anything fishy)

But these are version 2 type features conditioned on 1) Bandwidth being 
cheep and internal computer system maintenance and acquisition being 
slightly more costly. (and or 2) We probably want to integrating a thin 
bittorrent client into firefogg so we hit the sending out the source 
footage only once upstream cost ratio.

We need to start exploring the bittorrent integration anyway to 
distribute the bandwidth cost on the distribution side. So this work 
would lead us in a good direction as well.

peace,
--michael

Tisza Gergő wrote:
 Steve Bennett stevagewp at gmail.com writes:

   
 Why are we suddenly concerned about someone sneaking obscenity onto a
 wiki? As if no one has ever snuck a rude picture onto a main page...
 

 There is a slight difference between vandalism that shows up in recent changes
 and one that leaves no trail at all except maybe in log files only accessible
 for sysadmins.


 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l
   


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Wiki at Home Extension

2009-08-02 Thread Gregory Maxwell
On Sun, Aug 2, 2009 at 6:29 PM, Michael Dalemd...@wikimedia.org wrote:
[snip]
 two quick points.
 1) you don't have to re-upload the whole video just the sha1 or some
 sort of hash of the assigned chunk.

But each re-encoder must download the source material.

I agree that uploads aren't much of an issue.

[snip]
 other random clients that are encoding other pieces would make abuse
 very difficult... at the cost of a few small http requests after the
 encode is done, and at a cost of slightly more CPU cylces of the
 computing pool.

Is 2x slightly?  (Greater because some clients will abort/fail.)

Even that leaves open the risk that a single trouble maker will
register a few accounts and confirm their own blocks.  You can fight
that too— but it's an arms race with no end.  I have no doubt that the
problem can be made tolerably rare— but at what cost?

I don't think it's all that acceptable to significantly increase the
resources used for the operation of the site just for the sake of
pushing the capital and energy costs onto third parties, especially
when it appears that the cost to Wikimedia will not decrease (but
instead be shifted from equipment cost to bandwidth and developer
time).

[snip]
 We need to start exploring the bittorrent integration anyway to
 distribute the bandwidth cost on the distribution side. So this work
 would lead us in a good direction as well.

http://lists.wikimedia.org/pipermail/wikitech-l/2009-April/042656.html


I'm troubled that Wikimedia is suddenly so interested in all these
cost externalizations which will dramatically increase the total cost
but push those costs off onto (sometimes unwilling) third parties.

Tech spending by the Wikimedia Foundation is a fairly small portion of
the budget, enough that it has drawn some criticism.  Behaving in the
most efficient manner is laudable and the WMF has done excellently on
this front in the past.  Behaving in an inefficient manner in order to
externalize costs is, in my view, deplorable and something which
should be avoided.

Has some organizational problem arisen within Wikimedia which has made
it unreasonably difficult to obtain computing resources, but easy to
burn bandwidth and development time? I'm struggling to understand why
development-intensive externalization measures are being regarded as
first choice solutions, and invented ahead of the production
deployment of basic functionality.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] Bugzilla Weekly Report

2009-08-02 Thread reporter
MediaWiki Bugzilla Report for July 27, 2009 - August 03, 2009

Status changes this week

Bugs NEW   :  216 
Bugs ASSIGNED  :  35  
Bugs REOPENED  :  31  
Bugs RESOLVED  :  170 

Total bugs still open: 3789

Resolutions for the week:

Bugs marked FIXED  :  109 
Bugs marked REMIND :  0   
Bugs marked INVALID:  11  
Bugs marked DUPLICATE  :  21  
Bugs marked WONTFIX:  24  
Bugs marked WORKSFORME :  9   
Bugs marked LATER  :  4   
Bugs marked MOVED  :  0   

Specific Product/Component Resolutions  User Metrics 

New Bugs Per Component

Site requests   9   
server  7   
LiquidThreads   5   
Glossary5   
Special pages   4   

New Bugs Per Product

MediaWiki   16  
Wikimedia   12  
MediaWiki extensions26  
mwdumper1   
Wikipedia Mobile7   

Top 5 Bug Resolvers

innocentkiller [AT] gmail.com   30  
markus [AT] semantic-mediawiki.org  28  
alex.emsenhuber [AT] bluewin.ch 16  
JSchulz_4587 [AT] msn.com   10  
agarrett [AT] wikimedia.org 10  


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Wiki at Home Extension

2009-08-02 Thread Michael Dale
Lets see...

* all these tools will be needed for flattening sequences anyway. In 
that case CPU costs are really really high like 1/5 or lower real-time 
and the number of computation needed explodes much faster as every 
stable edit necessitates a new flattening of some portion of the 
sequence.

* I don't think its possible to scale the foundation's current donation 
model to traditional free net video distribution.

* We are not Google. Google lost what like ~470 million~ last year on 
youtube ...(and that's with $240 million in advertising) so total cost 
of $711 million [1] say we manage to do 1/100th of youtube ( not 
unreasonable consider we are a top 4 site. Just imagine a world where 
you watch one wikipedia video for every 100 you watch on youtube ) ... 
then we would be what like what 7x the total budget ? ( and they are not 
supporting video editing with flattening of sequences ) ... The pirate 
bay on the other hand operates at a technology cost comparable to 
wikimedia (~$3K~ a month in bandwidth) and is distributing like 1/2 of 
the nets torrents? [2].  (obviously these numbers are a bit of tea 
leaf reading but give or take an order of magnitude it should still be 
clear which model we should be moving towards.)

... I think its good to start thinking about p2p distribution and 
computation ... even if we are not using it today ...
 
* I must say I don't quite agree with your proposed tactic to retain 
neutral networks by avoiding bandwidth distribution via peer 2 peer 
technology. I am aware the net is not built for p2p nor is it very 
efficient vs CDNs ... but the whole micro payment system never paned out 
... Perhaps your right p2p will just give companies an excuse to 
restructure the net in a non network neutral way... but I think they 
already have plenty excuse with the existing popular bittorrent systems 
and don't see another way other way for not-for-profit net communities 
to distribute massive amounts of video to each-other.

* I think you may be blowing this ~a bit~ outside of proportion calling 
into question foundation priority in the context of this hack. If this 
was a big initiative over the course of a year or a initiative over the 
course of more than part-time over a week ~ ... then it would make more 
sense to worry about this. But in its present state its just a quick 
hack and the starting point of conversation not foundation policy or 
initiative.


peace,
michael


[1] 
http://www.ibtimes.com/articles/20090413/alleged-470-million-youtube-loss-will-be-cleared-week.htm
[2] 
http://newteevee.com/2009/07/19/the-pirate-bay-distributing-the-worlds-entertainment-for-3000-a-month/


Gregory Maxwell wrote:
 On Sun, Aug 2, 2009 at 6:29 PM, Michael Dalemd...@wikimedia.org wrote:
 [snip]
   
 two quick points.
 1) you don't have to re-upload the whole video just the sha1 or some
 sort of hash of the assigned chunk.
 

 But each re-encoder must download the source material.

 I agree that uploads aren't much of an issue.

 [snip]
   
 other random clients that are encoding other pieces would make abuse
 very difficult... at the cost of a few small http requests after the
 encode is done, and at a cost of slightly more CPU cylces of the
 computing pool.
 

 Is 2x slightly?  (Greater because some clients will abort/fail.)

 Even that leaves open the risk that a single trouble maker will
 register a few accounts and confirm their own blocks.  You can fight
 that too— but it's an arms race with no end.  I have no doubt that the
 problem can be made tolerably rare— but at what cost?

 I don't think it's all that acceptable to significantly increase the
 resources used for the operation of the site just for the sake of
 pushing the capital and energy costs onto third parties, especially
 when it appears that the cost to Wikimedia will not decrease (but
 instead be shifted from equipment cost to bandwidth and developer
 time).

 [snip]
   
 We need to start exploring the bittorrent integration anyway to
 distribute the bandwidth cost on the distribution side. So this work
 would lead us in a good direction as well.
 

 http://lists.wikimedia.org/pipermail/wikitech-l/2009-April/042656.html


 I'm troubled that Wikimedia is suddenly so interested in all these
 cost externalizations which will dramatically increase the total cost
 but push those costs off onto (sometimes unwilling) third parties.

 Tech spending by the Wikimedia Foundation is a fairly small portion of
 the budget, enough that it has drawn some criticism.  Behaving in the
 most efficient manner is laudable and the WMF has done excellently on
 this front in the past.  Behaving in an inefficient manner in order to
 externalize costs is, in my view, deplorable and something which
 should be avoided.

 Has some organizational problem arisen within Wikimedia which has made
 it unreasonably difficult to obtain computing resources, but easy to
 burn bandwidth and development time? I'm struggling to understand