Re: [Wikitech-l] Google Summer of Code: accepted projects
On Fri, Apr 24, 2009 at 12:31 AM, Wu Zhe w...@madk.org wrote: Asynchronous daemon doesn't make much sense if page purge occurs on server side, but what if we put off page purge to the browser? It works like this: 1. mw parser send request to daemon 2. daemon finds the work non-trivial, reply *immediately* with a best fit or just a place holder 3. browser renders the page, finds it's not final, so sends a request to daemon directly using AJAX 4. daemon reply to the browser when thumbnail is ready 5. browser replace temporary best fit / place holder with new thumb using Javascript Daemon now have to deal with two kinds of clients: mw servers and browsers. Letting browser wait instead of mw server has the benefit of reduced latency for users while still have an acceptable page to read before image replacing takes place and a perfect page after that. For most of users, it's likely that the replacing occurs as soon as page loading ends, since transfering page takes some time, and daemon would have already finished thumbnailing in the process. How long does it take to thumbnail a typical image, though? Even a parser cache hit (but Squid miss) will take hundreds of milliseconds to serve and hundreds of more milliseconds for network latency. If we're talking about each image adding 10 ms to the latency, then it's not worth it to add all this fancy asynchronous stuff. Moreover, in MediaWiki's case specifically, *very* few requests should actually require the thumbnailing. Only the first request for a given size of a given image should ever require thumbnailing: that can then be cached more or less forever. So it's not a good case to optimize for. If the architecture should be simplified significantly at the cost of slight extra latency in 0.01% of requests, I think it's clear that the simpler architecture is superior. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Google Summer of Code: accepted projects
2009/4/24 Aryeh Gregor simetrical+wikil...@gmail.com: How long does it take to thumbnail a typical image, though? Even a parser cache hit (but Squid miss) will take hundreds of milliseconds to serve and hundreds of more milliseconds for network latency. If we're talking about each image adding 10 ms to the latency, then it's not worth it to add all this fancy asynchronous stuff. The problem here seems to be that thumbnail generation times vary a lot, based on format and size of the original image. It could be 10 ms for one image and 10 s for another, who knows. Moreover, in MediaWiki's case specifically, *very* few requests should actually require the thumbnailing. Only the first request for a given size of a given image should ever require thumbnailing: that can then be cached more or less forever. That's true, we're already doing that. So it's not a good case to optimize for. AFAICT this isn't about optimization, it's about not bogging down the Apache that has the misfortune of getting the first request to thumb a huge image (but having a dedicated server for that instead), and about not letting the associated user wait for ages. Even worse, requests that thumb very large images could hit the 30s execution limit and fail, which means those thumbs will never be generated but every user requesting it will have a request last for 30s and time out. Roan Kattouw (Catrope) ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Google Summer of Code: accepted projects
All true. The images should not be rethumb'd unless resolution changes, a new version is uploaded, or the cache is otherwise purged. However, on initial rendering, the thumb generation can be a large part (especially if rendering multiple images) of overall page execution time. Being able to offload this elsewhere should decrease that load greatly. -Chad On Apr 24, 2009 1:23 PM, Roan Kattouw roan.katt...@gmail.com wrote: 2009/4/24 Aryeh Gregor simetrical+wikil...@gmail.comsimetrical%2bwikil...@gmail.com : How long does it take to thumbnail a typical image, though? Even a parser cache hit (but Squid ... The problem here seems to be that thumbnail generation times vary a lot, based on format and size of the original image. It could be 10 ms for one image and 10 s for another, who knows. Moreover, in MediaWiki's case specifically, *very* few requests should actually require the thu... That's true, we're already doing that. So it's not a good case to optimize for. AFAICT this isn't about optimization, it's about not bogging down the Apache that has the misfortune of getting the first request to thumb a huge image (but having a dedicated server for that instead), and about not letting the associated user wait for ages. Even worse, requests that thumb very large images could hit the 30s execution limit and fail, which means those thumbs will never be generated but every user requesting it will have a request last for 30s and time out. Roan Kattouw (Catrope) ___ Wikitech-l mailing list wikitec...@lists.wikimedia ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Google Summer of Code: accepted projects
2009/4/24 Chad innocentkil...@gmail.com: All true. The images should not be rethumb'd unless resolution changes, a new version is uploaded, or the cache is otherwise purged. Repeat: this is what we do already (not sure if that's what you're trying to say, but should implies differently). Roan Kattouw (Catrope) ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Google Summer of Code: accepted projects
I'm agreeing with you. By should I meant this should be happening already and issues with this are bugs. -Chad On Apr 24, 2009 1:32 PM, Roan Kattouw roan.katt...@gmail.com wrote: 2009/4/24 Chad innocentkil...@gmail.com: All true. The images should not be rethumb'd unless resolution changes, a new version is uploade... Repeat: this is what we do already (not sure if that's what you're trying to say, but should implies differently). Roan Kattouw (Catrope) ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Google Summer of Code: accepted projects
On 4/24/09 10:32 AM, Roan Kattouw wrote: 2009/4/24 Chadinnocentkil...@gmail.com: All true. The images should not be rethumb'd unless resolution changes, a new version is uploaded, or the cache is otherwise purged. Repeat: this is what we do already (not sure if that's what you're trying to say, but should implies differently). Just to summarize the current state, here's the default MediaWiki configuration workflow: * During page rendering, MediaWiki checks if a thumb of the proper size exists. * if not, we resize it synchronously on the same server (via GD or shell out to ImageMagick etc) * an img pointing to the file is added to output * The web browser loads up the already-rendered image file in the page. Here's the behavior variant we have on Wikimedia sites: * During page rendering, we make an img pointing to where the thumbnail should be * The web browser requests the thumbnail image file * If it doesn't exist, the upload web server proxies the request [1] back to MediaWiki, running on a subcluster which handles only thumbnail generation * MediaWiki resizes it synchronously via shell out to ImageMagick * The web server serves the now-completed file back to the client, and it's now on disk for the next request [1] http://svn.wikimedia.org/viewvc/mediawiki/trunk/tools/upload-scripts/ This prevents slow or broken thumbnailing operations from bogging down the *main* web servers, but if it's not reasonably fast we still have difficulties: * No placeholder image -- browser just shows a nice blank box * Multiple requests will cause multiple attempts to resize at once, potentially eating up all CPU time/memory/tmp disk space on the thumbnailing cluster So if we've got, say, a 50 megapixel PNG or TIFF high-res scan, or a giant animated GIF which is very expensive to resize, we don't have a good way of producing a thumbnail on a good schedule. It'll either time out a lot every time it changes, or just never actually complete. If we have a way to defer things we know will take longer, and show a placeholder until it's completed, then we can use those things more reliably. One suggestion that's been brought up for large images is to create a smaller version *once at upload time* which can then be used to quickly create inline thumbnails of various sizes on demand. But we still need some way to manage that asynchronous initial rendering, and have some kind of friendly behavior for what to show while it's working. -- brion ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Google Summer of Code: accepted projects
Hoi, At the moment we have an upper limit of 100Mb. The people who do restorations have one file that is 680Mb.. The corresponding jpg is also quite big !! Thanks, GerardM 2009/4/24 Roan Kattouw roan.katt...@gmail.com 2009/4/24 Aryeh Gregor simetrical+wikil...@gmail.comsimetrical%2bwikil...@gmail.com : How long does it take to thumbnail a typical image, though? Even a parser cache hit (but Squid miss) will take hundreds of milliseconds to serve and hundreds of more milliseconds for network latency. If we're talking about each image adding 10 ms to the latency, then it's not worth it to add all this fancy asynchronous stuff. The problem here seems to be that thumbnail generation times vary a lot, based on format and size of the original image. It could be 10 ms for one image and 10 s for another, who knows. Moreover, in MediaWiki's case specifically, *very* few requests should actually require the thumbnailing. Only the first request for a given size of a given image should ever require thumbnailing: that can then be cached more or less forever. That's true, we're already doing that. So it's not a good case to optimize for. AFAICT this isn't about optimization, it's about not bogging down the Apache that has the misfortune of getting the first request to thumb a huge image (but having a dedicated server for that instead), and about not letting the associated user wait for ages. Even worse, requests that thumb very large images could hit the 30s execution limit and fail, which means those thumbs will never be generated but every user requesting it will have a request last for 30s and time out. Roan Kattouw (Catrope) ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Google Summer of Code: accepted projects
Roan Kattouw wrote: The problem here seems to be that thumbnail generation times vary a lot, based on format and size of the original image. It could be 10 ms for one image and 10 s for another, who knows. yea again if we only issue the big resize operation on initial upload with a memory friendly in-place library like vips I think we will be oky. Since the user just waited like 10-15 minutes to upload their huge image waiting an additional 10-30s at that point for thumbnail and instant gratification of seeing your image on the upload page ... is not such a big deal. Then in-page use derivatives could predictably resize the 1024x786 ~or so~ image in realtime again instant gratification on page preview or page save. Operationally this could go out to a thumbnail server or be done on the apaches if they are small operations it may be easier to keep the existing infrastructure than to intelligently handle the edge cases outlined. ( many resize request at once, placeholders, image proxy / deamon setup) AFAICT this isn't about optimization, it's about not bogging down the Apache that has the misfortune of getting the first request to thumb a huge image (but having a dedicated server for that instead), and about not letting the associated user wait for ages. Even worse, requests that thumb very large images could hit the 30s execution limit and fail, which means those thumbs will never be generated but every user requesting it will have a request last for 30s and time out. Again this may be related to the use of unpredictable memory usage of image-magic when resizing large images instead of a fast memory confined resize engine, no? ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Google Summer of Code: accepted projects
Hoi, The library of Alexandria uses it for the display of their awesome Napoleontic lithographs.. It would be awesome if we had that code.. It is actually open source.. Thanks, Gerard 2009/4/24 David Gerard dger...@gmail.com 2009/4/24 Aryeh Gregor simetrical+wikil...@gmail.comsimetrical%2bwikil...@gmail.com : That's what occurred to me. In that case, the only possible thing to do seems to be to just have the image request wait until the image is thumbnailed. I guess you could show a placeholder image, but that's probably *less* friendly to the user, as long as we've specified the height and width in the HTML. The browser should provide some kind of placeholder already while the image is loading, after all, and if we let the browser provide the placeholder, then at least the image will appear automatically when it's done thumbnailing. There was a spec in earlier versions of HTML to put a low-res thumbnail up while the full image dribbled through your dialup - img lowsrc=image-placeholder.gif src=image.gif - but it was so little used (I know of no cases) that I don't know if it's even supported in browsers any more. http://www.htmlcodetutorial.com/images/_IMG_LOWSRC.html - d. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Google Summer of Code: accepted projects
On 4/24/09 11:05 AM, Michael Dale wrote: Roan Kattouw wrote: The problem here seems to be that thumbnail generation times vary a lot, based on format and size of the original image. It could be 10 ms for one image and 10 s for another, who knows. yea again if we only issue the big resize operation on initial upload with a memory friendly in-place library like vips I think we will be oky. Since the user just waited like 10-15 minutes to upload their huge image waiting an additional 10-30s at that point for thumbnail and instant gratification of seeing your image on the upload page ... is not such a big deal. Well, what about the 5 million other users browsing Special:Newimages? We don't want 50 simultaneous attempts to build that first über-thumbnail. :) -- brion ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Google Summer of Code: accepted projects
with a memory friendly in-place library like vips I think we will be oky. Since the user just waited like 10-15 minutes to upload their huge image waiting an additional 10-30s at that point for thumbnail and instant gratification of seeing your image on the upload page ... is not such a big deal. Well, what about the 5 million other users browsing Special:Newimages? We don't want 50 simultaneous attempts to build that first über-thumbnail. :) Thumbnail generation could be cascaded, i.e. 120px thumbs could be generated from the 800px previews instead of the original images. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Google Summer of Code: accepted projects
On Fri, Apr 24, 2009 at 3:58 PM, Brion Vibber br...@wikimedia.org wrote: Best to make it explicit rather than presume -- currently we have no such locking for slow resizing requests. :) Yes, definitely. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Google Summer of Code: accepted projects
Michael Dale wrote: yea again if we only issue the big resize operation on initial upload with a memory friendly in-place library like vips I think we will be oky. Since the user just waited like 10-15 minutes to upload their huge image waiting an additional 10-30s at that point for thumbnail and instant gratification of seeing your image on the upload page ... is not such a big deal. It can be parallelized, starting rendering the thumb while the file hasn't been completely uploaded yet (most formats will allow to do that). That'd need special software, the easiest would be to use a different domain on Special:Upload action to the resizing cluster. These changes are always an annoyance but it would ease many bugs: 10976, 16751, 18202, upload bar, non-NFS backend... ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Google Summer of Code: accepted projects
Also relevant: 17255 and 18201 And as this would be a new upload ssytem, also worth mentioning 18563 (new-upload branch) ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Google Summer of Code: accepted projects
On Fri, Apr 24, 2009 at 07:08:05PM +0100, David Gerard wrote: There was a spec in earlier versions of HTML to put a low-res thumbnail up while the full image dribbled through your dialup - img lowsrc=image-placeholder.gif src=image.gif - but it was so little used (I know of no cases) that I don't know if it's even supported in browsers any more. http://www.htmlcodetutorial.com/images/_IMG_LOWSRC.html I tried it with FireFox 3.0.9 and IE 7.0.6001.18000; neither paid any attention to it. IE 6.0.2800.1106 under Wine also ignored it. Too bad, that would have been nice if it worked. I don't know that we need fancy AJAX if we know at page rendering time whether the image is available, though. We might be able to get away with a simple script like this: var ImageCache={}; function loadImage(id, url){ var i = document.getElementById(id); if(i){ var img = new Image(); ImageCache[id] = img; img.onload=function(){ i.src = url; ImageCache[id]=null; }; img.src = url; } } And then generate the img tag with the placeholder and some id, and call that function onload for it. Of course, if there are a lot of these images on one page then we might run into the browser's concurrent connection limit, which an AJAX solution might be able to overcome. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Google Summer of Code: accepted projects
Michael Dale md...@wikimedia.org writes: I recommended that the image daemon run semi-synchronously since the changes needed to maintain multiple states and return non-cached place-holder images while managing updates and page purges for when the updated images are available within the wikimedia server architecture probably won't be completed in the summer of code time-line. But if the student is up for it the concept would be useful for other components like video transformation / transcoding, sequence flattening etc. But its not what I would recommend for the summer of code time-line. I may have problems understanding the concept semi-synchronously, does it mean when MW parse a page that contains thumbnail images, the parser sends requests to daemon which would reply twice for each request, one immediately with a best fit or a place holder (synchronously), one later on when thumbnail is ready (asynchronously)? == what would probably be better for the image resize efforts should focus on === (1) making the existing system more robust and (2) better taking advantage of multi-threaded servers. (1) right now the system chokes on large images we should deploy support for an in-place image resize maybe something like vips (?) (http://www.vips.ecs.soton.ac.uk/index.php?title=Speed_and_Memory_Use) The system should intelligently call vips to transform the image to a reasonable size at time of upload then use those derivative for just in time thumbs for articles. ( If vips is unavailable we don't transform and we don't crash the apache node.) Wow, vips sounds great, still reading its documentation. How is its performance on relatively small size (not huge, a few hundreds pixels in width/height) images compared with traditional single threaded resizing programs? (2) maybe spinning out the image transform process early on in the parsing of the page with a place-holder and callback so by the time all the templates and links have been looked up the image is ready for output. (maybe another function wfShellBackgroundExec($cmd, $callback_function) (maybe using |pcntl_fork then normal |wfShellExec then| ||pcntl_waitpid then callback function ... which sets some var in the parent process so that pageOutput knows its good to go) | Asynchronous daemon doesn't make much sense if page purge occurs on server side, but what if we put off page purge to the browser? It works like this: 1. mw parser send request to daemon 2. daemon finds the work non-trivial, reply *immediately* with a best fit or just a place holder 3. browser renders the page, finds it's not final, so sends a request to daemon directly using AJAX 4. daemon reply to the browser when thumbnail is ready 5. browser replace temporary best fit / place holder with new thumb using Javascript Daemon now have to deal with two kinds of clients: mw servers and browsers. Letting browser wait instead of mw server has the benefit of reduced latency for users while still have an acceptable page to read before image replacing takes place and a perfect page after that. For most of users, it's likely that the replacing occurs as soon as page loading ends, since transfering page takes some time, and daemon would have already finished thumbnailing in the process. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Google Summer of Code: accepted projects
Michael Dale md...@wikimedia.org writes: I recommended that the image daemon run semi-synchronously since the changes needed to maintain multiple states and return non-cached place-holder images while managing updates and page purges for when the updated images are available within the wikimedia server architecture probably won't be completed in the summer of code time-line. But if the student is up for it the concept would be useful for other components like video transformation / transcoding, sequence flattening etc. But its not what I would recommend for the summer of code time-line. I may have problems understanding the concept semi-synchronously, does it mean when MW parse a page that contains thumbnail images, the parser sends requests to daemon which would reply twice for each request, one immediately with a best fit or a place holder (synchronously), one later on when thumbnail is ready (asynchronously)? == what would probably be better for the image resize efforts should focus on === (1) making the existing system more robust and (2) better taking advantage of multi-threaded servers. (1) right now the system chokes on large images we should deploy support for an in-place image resize maybe something like vips (?) (http://www.vips.ecs.soton.ac.uk/index.php?title=Speed_and_Memory_Use) The system should intelligently call vips to transform the image to a reasonable size at time of upload then use those derivative for just in time thumbs for articles. ( If vips is unavailable we don't transform and we don't crash the apache node.) Wow, vips sounds great, still reading its documentation. How is its performance on relatively small size (not huge, a few hundreds pixels in width/height) images compared with traditional single threaded resizing programs? (2) maybe spinning out the image transform process early on in the parsing of the page with a place-holder and callback so by the time all the templates and links have been looked up the image is ready for output. (maybe another function wfShellBackgroundExec($cmd, $callback_function) (maybe using |pcntl_fork then normal |wfShellExec then| ||pcntl_waitpid then callback function ... which sets some var in the parent process so that pageOutput knows its good to go) | Asynchronous daemon doesn't make much sense if page purge occurs on server side, but what if we put off page purge to the browser? It works like this: 1. mw parser send request to daemon 2. daemon finds the work non-trivial, reply *immediately* with a best fit or just a place holder 3. browser renders the page, finds it's not final, so sends a request to daemon directly using AJAX 4. daemon reply to the browser when thumbnail is ready 5. browser replace temporary best fit / place holder with new thumb using Javascript The daemon now have to deal with two kinds of clients: mw servers and browsers. Letting browser wait instead of mw server has the benefit of reduced latency for users while still have an acceptable page to read before image replacing takes place and a perfect page after that. For most of users, it's likely that the replacing occurs as soon as page loading ends, since transfering page takes some time, and daemon would have already finished thumbnailing in the process. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Google Summer of Code: accepted projects
Michael Dale md...@wikimedia.org writes: I recommended that the image daemon run semi-synchronously since the changes needed to maintain multiple states and return non-cached place-holder images while managing updates and page purges for when the updated images are available within the wikimedia server architecture probably won't be completed in the summer of code time-line. But if the student is up for it the concept would be useful for other components like video transformation / transcoding, sequence flattening etc. But its not what I would recommend for the summer of code time-line. I may have problems understanding the concept semi-synchronously, does it mean when MW parse a page that contains thumbnail images, the parser sends requests to daemon which would reply twice for each request, one immediately with a best fit or a place holder (synchronously), one later on when thumbnail is ready (asynchronously)? == what would probably be better for the image resize efforts should focus on === (1) making the existing system more robust and (2) better taking advantage of multi-threaded servers. (1) right now the system chokes on large images we should deploy support for an in-place image resize maybe something like vips (?) (http://www.vips.ecs.soton.ac.uk/index.php?title=Speed_and_Memory_Use) The system should intelligently call vips to transform the image to a reasonable size at time of upload then use those derivative for just in time thumbs for articles. ( If vips is unavailable we don't transform and we don't crash the apache node.) Wow, vips sounds great, still reading its documentation. How is its performance on relatively small size (not huge, a few hundreds pixels in width/height) images compared with traditional single threaded resizing programs? (2) maybe spinning out the image transform process early on in the parsing of the page with a place-holder and callback so by the time all the templates and links have been looked up the image is ready for output. (maybe another function wfShellBackgroundExec($cmd, $callback_function) (maybe using |pcntl_fork then normal |wfShellExec then| ||pcntl_waitpid then callback function ... which sets some var in the parent process so that pageOutput knows its good to go) | Asynchronous daemon doesn't make much sense if page purge occurs on server side, but what if we put off page purge to the browser? It works like this: 1. mw parser send request to daemon 2. daemon finds the work non-trivial, reply *immediately* with a best fit or just a place holder 3. browser renders the page, finds it's not final, so sends a request to daemon directly using AJAX 4. daemon reply to the browser when thumbnail is ready 5. browser replace temporary best fit / place holder with new thumb using Javascript The daemon now have to deal with two kinds of clients: mw servers and browsers. Letting browser wait instead of mw server has the benefit of reduced latency for users while still have an acceptable page to read before image replacing takes place and a perfect page after that. For most of users, it's likely that the replacing occurs as soon as page loading ends, since transfering page takes some time, and daemon would have already finished thumbnailing in the process. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Google Summer of Code: accepted projects
Aryeh Gregor wrote: I'm not clear on why we don't just make the daemon synchronously return a result the way ImageMagick effectively does. Given the level of reuse of thumbnails, it seems unlikely that the latency is a significant concern -- virtually no requests will ever actually wait on it. ( I basically outlined these issues on the soc page but here they are again with at bit more clarity ) I recommended that the image daemon run semi-synchronously since the changes needed to maintain multiple states and return non-cached place-holder images while managing updates and page purges for when the updated images are available within the wikimedia server architecture probably won't be completed in the summer of code time-line. But if the student is up for it the concept would be useful for other components like video transformation / transcoding, sequence flattening etc. But its not what I would recommend for the summer of code time-line. == per issues outlined in bug 4854 == I don't think its a good idea to invest a lot of energy into a separate python based image daemon. It won't avoid all problems listed in bug 4854 Shell-character-exploit issues should be checked against anyway (since not everyone is going to install the daemon) Other people using mediaWiki won't add a python or java based image resize and resolve dependency python or java component libraries. It won't be easier to install than imagemagick or php-gd that are repository hosted applications and already present in shared hosting environments. Once you start integrating other libs like (java) Batik it becomes difficult to resolve dependencies (java, python etc) and to install you have to push out a new program that is not integrated into all the application repository manages for the various distributions. Potential to isolate CPU and memory usage should be considered in the core medaiWiki image resize support anyway . ie we don't want to crash other peoples servers who are using mediaWiki by not checking upper bounds of image transforms. Instead we should make the core image transform smarter maybe have a configuration var that /attempts/ to bind the upper memory for spawned processing and take that into account before issuing the shell command for a given large image transformation with a given sell application. == what would probably be better for the image resize efforts should focus on === (1) making the existing system more robust and (2) better taking advantage of multi-threaded servers. (1) right now the system chokes on large images we should deploy support for an in-place image resize maybe something like vips (?) (http://www.vips.ecs.soton.ac.uk/index.php?title=Speed_and_Memory_Use) The system should intelligently call vips to transform the image to a reasonable size at time of upload then use those derivative for just in time thumbs for articles. ( If vips is unavailable we don't transform and we don't crash the apache node.) (2) maybe spinning out the image transform process early on in the parsing of the page with a place-holder and callback so by the time all the templates and links have been looked up the image is ready for output. (maybe another function wfShellBackgroundExec($cmd, $callback_function) (maybe using |pcntl_fork then normal |wfShellExec then| ||pcntl_waitpid then callback function ... which sets some var in the parent process so that pageOutput knows its good to go) | If operationally the daemon should be on a separate server we should still more or less run synchronously ... as mentioned above ... if possible the daemon should be php based so we don't explode the dependencies for deploying robust image handling with mediaWiki. peace, --michael ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Google Summer of Code: accepted projects
Thanks for taking care of the announce mail, Roan! I spent all day yesterday at the dentists... whee :P I've taken the liberty of reposting it on the tech blog: http://techblog.wikimedia.org/2009/04/google-summer-of-code-student-projects-accepted/ I'd love for us to get the students set up on the blog to keep track of their project progress and raise visibility... :D -- brion ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Google Summer of Code: accepted projects
On Wed, Apr 22, 2009 at 12:54 AM, Marco Schuster ma...@harddisk.is-a-geek.org wrote: On Wed, Apr 22, 2009 at 12:22 AM, Roan Kattouw roan.katt...@gmail.comwrote: * Zhe Wu, mentored by Aryeh Gregor (Simetrical), will be building a thumbnailing daemon, so image manipulation won't have to happen on the Apache servers any more Wow, I'm lookin' forward to this. Mighta be worth a try to give the upper the ability to choose non-standard resizing filters or so... or full-fledged image manipulation, something like a wiki-style photoshop. On a semi-related note: What's the status of the management routines that handle thrwoaway things like math PNGs? Is this a generic system, so it can be used e.g. for jmol PNGs in the future? Is it integrated with the image thumbnail handling? Should it be? Magnus ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[Wikitech-l] Google Summer of Code: accepted projects
Yesterday, the selection of GSoC projects was officially announced. For MediaWiki, the following projects have been accepted: * Niklas Laxström (Nikerabbit), mentored by Siebrand, will be working on improving localization and internationalization in MediaWiki, as well as improving the Translate extension used on translatewiki.net * Zhe Wu, mentored by Aryeh Gregor (Simetrical), will be building a thumbnailing daemon, so image manipulation won't have to happen on the Apache servers any more * Jeroen de Dauw, mentored by Yaron Koren, will be improving the Semantic Layers extension and merging it into the Semantic Google Maps extension * Gerardo Antonio Cabero, mentored by Michael Dale (mdale), will be improving the Cortado applet for video playback (I'm a bit fuzzy on the details for this one) The official list with links to (parts of) the proposals can be found at the Google website [1]; lists for other organizations can be reached through the list of participating organizations [2]. The next event on the GSoC timeline [3] is the community bonding period [4], during which the students are supposed to get to know their mentors and the community. This period lasts until May 23rd, when the students actually begin coding. Starting now and continuing at least until the end of GSoC in August, you will probably see and hear from the students on IRC and the mailing lists and hear about the projects they're working on. To repeat the crux of an earlier thread on this list [5]: be nice to these special newcomers, make them feel welcome and comfortable, and try not to bite them :) To the mentors and students: have fun! Roan Kattouw (Catrope) [1] http://socghop.appspot.com/org/home/google/gsoc2009/wikimedia [2] http://socghop.appspot.com/program/accepted_orgs/google/gsoc2009 [3] http://socghop.appspot.com/document/show/program/google/gsoc2009/timeline [4] http://googlesummerofcode.blogspot.com/2007/04/so-what-is-this-community-bonding-all.html [5] http://lists.wikimedia.org/pipermail/wikitech-l/2009-March/041964.html ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Google Summer of Code: accepted projects
2009/4/22 Michael Dale md...@wikimedia.org: Marco Schuster wrote: On Wed, Apr 22, 2009 at 12:22 AM, Roan Kattouw roan.katt...@gmail.comwrote: * Zhe Wu, mentored by Aryeh Gregor (Simetrical), will be building a thumbnailing daemon, so image manipulation won't have to happen on the Apache servers any more Wow, I'm lookin' forward to this. Mighta be worth a try to give the upper the ability to choose non-standard resizing filters or so... or full-fledged image manipulation, something like a wiki-style photoshop. I was looking at http://editor.pixastic.com/ ... wiki-style photoshop would be cool ... but not in the scope of that soc project ;) You can do pretty much anything with ImageMagick. Trouble is that it's not the fastest at *anything*. Depends how much that affects performance in practice - something that *just* thumbnails could be all sorts of more efficient, but you'd need a new program for each function, and most Unix users of MediaWiki thumbnail with ImageMagick already so it'll be there. - d. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Google Summer of Code: accepted projects
On Tue, Apr 21, 2009 at 8:16 PM, David Gerard dger...@gmail.com wrote: 2009/4/22 Michael Dale md...@wikimedia.org: Marco Schuster wrote: On Wed, Apr 22, 2009 at 12:22 AM, Roan Kattouw roan.katt...@gmail.comwrote: * Zhe Wu, mentored by Aryeh Gregor (Simetrical), will be building a thumbnailing daemon, so image manipulation won't have to happen on the Apache servers any more Wow, I'm lookin' forward to this. Mighta be worth a try to give the upper the ability to choose non-standard resizing filters or so... or full-fledged image manipulation, something like a wiki-style photoshop. I was looking at http://editor.pixastic.com/ ... wiki-style photoshop would be cool ... but not in the scope of that soc project ;) You can do pretty much anything with ImageMagick. Trouble is that it's not the fastest at *anything*. Depends how much that affects performance in practice - something that *just* thumbnails could be all sorts of more efficient, but you'd need a new program for each function, and most Unix users of MediaWiki thumbnail with ImageMagick already so it'll be there. - d. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l The main issue with the daemon idea (which was discussed at length in #mediawiki a few weeks ago) is that it requires a major change in how we handle images. Right now, the process involves rendering on-demand, rather than at-leisure. This has the benefit of always producing an ideal thumb'd image at the end of every parse. However the major drawbacks are an increase in parsing time (while we wait for ImageMagik to do its thing) and an increased load on the app servers. The only time we can sidestep this is if someone uses a thumb dimension for which we already have a thumb rendered. In order for this to work, we'd need to shift to a style of render when you get a chance, but give me the best fit for now. Basically, we'd begin parsing and find that we need a thumbnailed copy of some image, but we don't have the ideal size just yet. Instead, we could return the best-fitting thumbnail so far and use that until the daemon has given us the right image. Not an easy task, but I certainly hope some progress can be made on it over the summer :) -Chad ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Google Summer of Code: accepted projects
On Tue, Apr 21, 2009 at 7:54 PM, Marco Schuster ma...@harddisk.is-a-geek.org wrote: Wow, I'm lookin' forward to this. Mighta be worth a try to give the upper the ability to choose non-standard resizing filters or so... or full-fledged image manipulation, something like a wiki-style photoshop. That seems to be orthogonal to the proposed project. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Google Summer of Code: accepted projects
On Tue, Apr 21, 2009 at 8:34 PM, Chad innocentkil...@gmail.com wrote: The main issue with the daemon idea (which was discussed at length in #mediawiki a few weeks ago) is that it requires a major change in how we handle images. Right now, the process involves rendering on-demand, rather than at-leisure. This has the benefit of always producing an ideal thumb'd image at the end of every parse. However the major drawbacks are an increase in parsing time (while we wait for ImageMagik to do its thing) and an increased load on the app servers. The only time we can sidestep this is if someone uses a thumb dimension for which we already have a thumb rendered. In order for this to work, we'd need to shift to a style of render when you get a chance, but give me the best fit for now. Basically, we'd begin parsing and find that we need a thumbnailed copy of some image, but we don't have the ideal size just yet. Instead, we could return the best-fitting thumbnail so far and use that until the daemon has given us the right image. I'm not clear on why we don't just make the daemon synchronously return a result the way ImageMagick effectively does. Given the level of reuse of thumbnails, it seems unlikely that the latency is a significant concern -- virtually no requests will ever actually wait on it. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l