Re: [Wikitech-l] w...@home Extension
On Sun, Aug 2, 2009 at 2:32 AM, Platonides platoni...@gmail.com wrote: I'd actually be interested how YouTube and the other video hosters protect themselves against hacker threats - did they code totally new de/en-coders? That would be even more risky than using existing, tested (de|en)coders. Really? If they simply don't publish the source (and the binaries), then the only possible way for an attacker is fuzzing... and that can take long time. Marco -- VMSoft GbR Nabburger Str. 15 81737 München Geschäftsführer: Marco Schuster, Volker Hemmert http://vmsoft-gbr.de ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] w...@home Extension
2009/8/2 Marco Schuster ma...@harddisk.is-a-geek.org: Really? If they simply don't publish the source (and the binaries), then the only possible way for an attacker is fuzzing... and that can take long time. I believe they use ffmpeg, like everyone does. The ffmpeg code has had people kicking it for quite a while. Transcoding as a given Unix user with not many powers is reasonable isolation. - d. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Wiki at Home Extension
On Sun, Aug 2, 2009 at 12:16 AM, Tisza Gergőgti...@gmail.com wrote: Gregory Maxwell gmaxwell at gmail.com writes: I don't know how to figure out how much it would 'cost' to have human contributors spot embedded penises snuck into transcodes and then figure out which of several contributing transcoders are doing it and blocking them, only to have the bad user switch IPs and begin again. ... but it seems impossibly expensive even though it's not an actual dollar cost. Standard solution to that is to perform each operation multiple times on different machines and then compare results. Of course, that raises bandwidth costs even further. Why are we suddenly concerned about someone sneaking obscenity onto a wiki? As if no one has ever snuck a rude picture onto a main page... Steve ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Wiki at Home Extension
Hoi, Because it is no longer obvious that vandalism has taken place. You have to look at the changes.. the whole time to find what might be an issue Thanks, GerardM 2009/8/2 Steve Bennett stevag...@gmail.com On Sun, Aug 2, 2009 at 12:16 AM, Tisza Gergőgti...@gmail.com wrote: Gregory Maxwell gmaxwell at gmail.com writes: I don't know how to figure out how much it would 'cost' to have human contributors spot embedded penises snuck into transcodes and then figure out which of several contributing transcoders are doing it and blocking them, only to have the bad user switch IPs and begin again. ... but it seems impossibly expensive even though it's not an actual dollar cost. Standard solution to that is to perform each operation multiple times on different machines and then compare results. Of course, that raises bandwidth costs even further. Why are we suddenly concerned about someone sneaking obscenity onto a wiki? As if no one has ever snuck a rude picture onto a main page... Steve ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[Wikitech-l] GIF thumbnailing
Currently, server side GIF thumbnailing on Wikimedia sites is disabled entirely by setting $wgMediaHandlers['image/gif'] = 'BitmapHandler_ClientOnly'; This causes all GIF files to be send to the browser at original size regardless of what size has been requested. While most folks seem to have pretty much resigned to the fact that animated GIFs can't be thumbnailed -- it never worked very well to begin with -- the fact that even static GIFs are sent at full resolution remains somewhat annoying, especially since people have uploaded some extremely large bitmaps in GIF format in the past. It seems to me that delivering *static* thumbnails of GIF images, either in GIF or PNG format, would be a considerable improvement over the current situation. And indeed, the code to do that seems to be already in place: just set $wgMaxAnimatedGifArea = 0; So, my questions would be: 1. Is there a reason we don't do this already? 2. If yes, and the reason is the GIF encoding causes too much load, would thumbnailing GIFs to PNG be better? It should only take a few lines of code to change the output format. 3. Alternatively, if the problem is ImageMagick taking too much time to read animated GIFs even just to extract only the first frame, would some other scaling program be better? Indeed, it should even be possible to write a bit of PHP code to pull out just the first frame of a GIF file and hand it off to ImageMagick for scaling. Ps. I'd also like to take the opportunity to remind everyone of the existence of https://bugzilla.wikimedia.org/show_bug.cgi?id=16451 and of the excellent-looking patch Werdna has written for it. We can do a _lot_ better with GIfs that we're currently doing. -- Ilmari Karonen ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] GIF thumbnailing
2009/8/2 Ilmari Karonen nos...@vyznev.net: It seems to me that delivering *static* thumbnails of GIF images, either in GIF or PNG format, would be a considerable improvement over the current situation. And indeed, the code to do that seems to be already in place: just set $wgMaxAnimatedGifArea = 0; So, my questions would be: 1. Is there a reason we don't do this already? Careful to only do it if in fact definitely scaling. Animated GIFs are in use on lots of page on en:wp, for instance, when there's reason to put a small animation right there on the page rather than a click away. 3. Alternatively, if the problem is ImageMagick taking too much time to read animated GIFs even just to extract only the first frame, would some other scaling program be better? Indeed, it should even be possible to write a bit of PHP code to pull out just the first frame of a GIF file and hand it off to ImageMagick for scaling. ImageMagick is a Swiss army knife - it does everything and isn't the best at anything. I would be unsurprised if there was a tool to do this specific job much better. - d. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] GIF thumbnailing
On 8/2/09 7:26 AM, Ilmari Karonen wrote: It seems to me that delivering *static* thumbnails of GIF images, either in GIF or PNG format, would be a considerable improvement over the current situation. And indeed, the code to do that seems to be already in place: just set $wgMaxAnimatedGifArea = 0; So, my questions would be: 1. Is there a reason we don't do this already? IIRC, we don't yet have working detection for animated GIFs: https://bugzilla.wikimedia.org/show_bug.cgi?id=16451 Looks like Andrew put together an updated patch a couple months ago but didn't have a chance to test and confirm it was working properly. Anyone care to take a peek? -- brion ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] GIF thumbnailing
On 02/08/2009, at 5:36 PM, Brion Vibber wrote: Looks like Andrew put together an updated patch a couple months ago but didn't have a chance to test and confirm it was working properly. Anyone care to take a peek? I can possibly poke this tomorrow, must have slipped through my fingers on Bug Friday :) -- Andrew Garrett agarr...@wikimedia.org http://werdn.us/ ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] GIF thumbnailing
On Sun, Aug 2, 2009 at 10:26 AM, Ilmari Karonennos...@vyznev.net wrote: [snip] It seems to me that delivering *static* thumbnails of GIF images, either in GIF or PNG format, would be a considerable improvement over the current situation. And indeed, the code to do that seems to be already in place: just set $wgMaxAnimatedGifArea = 0; So— separate from animation why would you use an gif rather than a PNG? I can think of two reasons: (1) you're making a spacer image and the gif is actually smaller, scaling isn't relevant here (2) you're using gif transparency and are obsessed with compatibility with old IE. Scaling doesn't tend to work really well with binary transparency. In other cases the gif tends to be larger, loads slower, etc. They can be converted to PNG losslessly, so you should probably do so. What am I missing? ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] GIF thumbnailing
On Sun, Aug 2, 2009 at 1:33 PM, Gregory Maxwellgmaxw...@gmail.com wrote: In other cases the gif tends to be larger, loads slower, etc. They can be converted to PNG losslessly, so you should probably do so. What am I missing? That a lot of people don't know the above and upload GIFs anyway, and until we can fix all users of each of these images, it would be nice to not serve the full-size file all the time. (Anyone want to make the planned transcode-on-upload infrastructure work for images too? :) ) ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] GIF thumbnailing
Brion Vibber wrote: On 8/2/09 7:26 AM, Ilmari Karonen wrote: It seems to me that delivering *static* thumbnails of GIF images, either in GIF or PNG format, would be a considerable improvement over the current situation. And indeed, the code to do that seems to be already in place: just set $wgMaxAnimatedGifArea = 0; IIRC, we don't yet have working detection for animated GIFs: https://bugzilla.wikimedia.org/show_bug.cgi?id=16451 That shouldn't matter, should it? Setting $wgMaxAnimatedGifArea to N simply causes Bitmap.php to tell ImageMagick to only extract the first frame of any GIF files with N or more pixels (which makes no difference if the GIF in fact only has one frame). -- Ilmari Karonen ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] GIF thumbnailing
Aryeh Gregor wrote: On Sun, Aug 2, 2009 at 1:33 PM, Gregory Maxwellgmaxw...@gmail.com wrote: In other cases the gif tends to be larger, loads slower, etc. They can be converted to PNG losslessly, so you should probably do so. What am I missing? That a lot of people don't know the above and upload GIFs anyway, and until we can fix all users of each of these images, it would be nice to not serve the full-size file all the time. (Anyone want to make the planned transcode-on-upload infrastructure work for images too? :) ) Also, Commons alone has almost a hundred thousand existing GIF files. Converting them all to PNG would be a significant job, especially since changing the format (and thus the suffix) means that they can't just be reuploaded under the same title. (Which means you have to copy the description page and the old upload history, update links on all other projects, give any non-Wikimedia projects using ForeignAPIRepo time to update their links and hope that they actually do so, and even after you've done all that, it *still* ends up messing with watchlists, user contribution histories and old article versions because there's no way to update those. Eugh.) -- Ilmari Karonen ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] GIF thumbnailing
Gregory Maxwell wrote: (2) you're using gif transparency and are obsessed with compatibility with old IE. Scaling doesn't tend to work really well with binary transparency. By the way, forgot to mention this earlier, but this is actually a very good argument in favor of outputting thumbnails of GIFs in PNG format. -- Ilmari Karonen ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Wiki at Home Extension
Steve Bennett stevagewp at gmail.com writes: Why are we suddenly concerned about someone sneaking obscenity onto a wiki? As if no one has ever snuck a rude picture onto a main page... There is a slight difference between vandalism that shows up in recent changes and one that leaves no trail at all except maybe in log files only accessible for sysadmins. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Wiki at Home Extension
two quick points. 1) you don't have to re-upload the whole video just the sha1 or some sort of hash of the assigned chunk. 2) should be relatively strait froward to catch abuse via assigned user id's to each chunk uploaded. But checking the sha1 a few times from other random clients that are encoding other pieces would make abuse very difficult... at the cost of a few small http requests after the encode is done, and at a cost of slightly more CPU cylces of the computing pool. But as this thread has pointed out CPU cycles are much cheaper than bandwidth bits or humans time patrolling derivatives. We have the advantage with a system like Firefogg that we control the version of the encoder pushed out to clients via auto-update and check that before accepting their participation (so sha1s should match if the client is not doing anything fishy) But these are version 2 type features conditioned on 1) Bandwidth being cheep and internal computer system maintenance and acquisition being slightly more costly. (and or 2) We probably want to integrating a thin bittorrent client into firefogg so we hit the sending out the source footage only once upstream cost ratio. We need to start exploring the bittorrent integration anyway to distribute the bandwidth cost on the distribution side. So this work would lead us in a good direction as well. peace, --michael Tisza Gergő wrote: Steve Bennett stevagewp at gmail.com writes: Why are we suddenly concerned about someone sneaking obscenity onto a wiki? As if no one has ever snuck a rude picture onto a main page... There is a slight difference between vandalism that shows up in recent changes and one that leaves no trail at all except maybe in log files only accessible for sysadmins. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Wiki at Home Extension
On Sun, Aug 2, 2009 at 6:29 PM, Michael Dalemd...@wikimedia.org wrote: [snip] two quick points. 1) you don't have to re-upload the whole video just the sha1 or some sort of hash of the assigned chunk. But each re-encoder must download the source material. I agree that uploads aren't much of an issue. [snip] other random clients that are encoding other pieces would make abuse very difficult... at the cost of a few small http requests after the encode is done, and at a cost of slightly more CPU cylces of the computing pool. Is 2x slightly? (Greater because some clients will abort/fail.) Even that leaves open the risk that a single trouble maker will register a few accounts and confirm their own blocks. You can fight that too— but it's an arms race with no end. I have no doubt that the problem can be made tolerably rare— but at what cost? I don't think it's all that acceptable to significantly increase the resources used for the operation of the site just for the sake of pushing the capital and energy costs onto third parties, especially when it appears that the cost to Wikimedia will not decrease (but instead be shifted from equipment cost to bandwidth and developer time). [snip] We need to start exploring the bittorrent integration anyway to distribute the bandwidth cost on the distribution side. So this work would lead us in a good direction as well. http://lists.wikimedia.org/pipermail/wikitech-l/2009-April/042656.html I'm troubled that Wikimedia is suddenly so interested in all these cost externalizations which will dramatically increase the total cost but push those costs off onto (sometimes unwilling) third parties. Tech spending by the Wikimedia Foundation is a fairly small portion of the budget, enough that it has drawn some criticism. Behaving in the most efficient manner is laudable and the WMF has done excellently on this front in the past. Behaving in an inefficient manner in order to externalize costs is, in my view, deplorable and something which should be avoided. Has some organizational problem arisen within Wikimedia which has made it unreasonably difficult to obtain computing resources, but easy to burn bandwidth and development time? I'm struggling to understand why development-intensive externalization measures are being regarded as first choice solutions, and invented ahead of the production deployment of basic functionality. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[Wikitech-l] Bugzilla Weekly Report
MediaWiki Bugzilla Report for July 27, 2009 - August 03, 2009 Status changes this week Bugs NEW : 216 Bugs ASSIGNED : 35 Bugs REOPENED : 31 Bugs RESOLVED : 170 Total bugs still open: 3789 Resolutions for the week: Bugs marked FIXED : 109 Bugs marked REMIND : 0 Bugs marked INVALID: 11 Bugs marked DUPLICATE : 21 Bugs marked WONTFIX: 24 Bugs marked WORKSFORME : 9 Bugs marked LATER : 4 Bugs marked MOVED : 0 Specific Product/Component Resolutions User Metrics New Bugs Per Component Site requests 9 server 7 LiquidThreads 5 Glossary5 Special pages 4 New Bugs Per Product MediaWiki 16 Wikimedia 12 MediaWiki extensions26 mwdumper1 Wikipedia Mobile7 Top 5 Bug Resolvers innocentkiller [AT] gmail.com 30 markus [AT] semantic-mediawiki.org 28 alex.emsenhuber [AT] bluewin.ch 16 JSchulz_4587 [AT] msn.com 10 agarrett [AT] wikimedia.org 10 ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Wiki at Home Extension
Lets see... * all these tools will be needed for flattening sequences anyway. In that case CPU costs are really really high like 1/5 or lower real-time and the number of computation needed explodes much faster as every stable edit necessitates a new flattening of some portion of the sequence. * I don't think its possible to scale the foundation's current donation model to traditional free net video distribution. * We are not Google. Google lost what like ~470 million~ last year on youtube ...(and that's with $240 million in advertising) so total cost of $711 million [1] say we manage to do 1/100th of youtube ( not unreasonable consider we are a top 4 site. Just imagine a world where you watch one wikipedia video for every 100 you watch on youtube ) ... then we would be what like what 7x the total budget ? ( and they are not supporting video editing with flattening of sequences ) ... The pirate bay on the other hand operates at a technology cost comparable to wikimedia (~$3K~ a month in bandwidth) and is distributing like 1/2 of the nets torrents? [2]. (obviously these numbers are a bit of tea leaf reading but give or take an order of magnitude it should still be clear which model we should be moving towards.) ... I think its good to start thinking about p2p distribution and computation ... even if we are not using it today ... * I must say I don't quite agree with your proposed tactic to retain neutral networks by avoiding bandwidth distribution via peer 2 peer technology. I am aware the net is not built for p2p nor is it very efficient vs CDNs ... but the whole micro payment system never paned out ... Perhaps your right p2p will just give companies an excuse to restructure the net in a non network neutral way... but I think they already have plenty excuse with the existing popular bittorrent systems and don't see another way other way for not-for-profit net communities to distribute massive amounts of video to each-other. * I think you may be blowing this ~a bit~ outside of proportion calling into question foundation priority in the context of this hack. If this was a big initiative over the course of a year or a initiative over the course of more than part-time over a week ~ ... then it would make more sense to worry about this. But in its present state its just a quick hack and the starting point of conversation not foundation policy or initiative. peace, michael [1] http://www.ibtimes.com/articles/20090413/alleged-470-million-youtube-loss-will-be-cleared-week.htm [2] http://newteevee.com/2009/07/19/the-pirate-bay-distributing-the-worlds-entertainment-for-3000-a-month/ Gregory Maxwell wrote: On Sun, Aug 2, 2009 at 6:29 PM, Michael Dalemd...@wikimedia.org wrote: [snip] two quick points. 1) you don't have to re-upload the whole video just the sha1 or some sort of hash of the assigned chunk. But each re-encoder must download the source material. I agree that uploads aren't much of an issue. [snip] other random clients that are encoding other pieces would make abuse very difficult... at the cost of a few small http requests after the encode is done, and at a cost of slightly more CPU cylces of the computing pool. Is 2x slightly? (Greater because some clients will abort/fail.) Even that leaves open the risk that a single trouble maker will register a few accounts and confirm their own blocks. You can fight that too— but it's an arms race with no end. I have no doubt that the problem can be made tolerably rare— but at what cost? I don't think it's all that acceptable to significantly increase the resources used for the operation of the site just for the sake of pushing the capital and energy costs onto third parties, especially when it appears that the cost to Wikimedia will not decrease (but instead be shifted from equipment cost to bandwidth and developer time). [snip] We need to start exploring the bittorrent integration anyway to distribute the bandwidth cost on the distribution side. So this work would lead us in a good direction as well. http://lists.wikimedia.org/pipermail/wikitech-l/2009-April/042656.html I'm troubled that Wikimedia is suddenly so interested in all these cost externalizations which will dramatically increase the total cost but push those costs off onto (sometimes unwilling) third parties. Tech spending by the Wikimedia Foundation is a fairly small portion of the budget, enough that it has drawn some criticism. Behaving in the most efficient manner is laudable and the WMF has done excellently on this front in the past. Behaving in an inefficient manner in order to externalize costs is, in my view, deplorable and something which should be avoided. Has some organizational problem arisen within Wikimedia which has made it unreasonably difficult to obtain computing resources, but easy to burn bandwidth and development time? I'm struggling to understand