Re: [Wikitech-l] Google Summer of Code: accepted projects

2009-04-24 Thread Aryeh Gregor
On Fri, Apr 24, 2009 at 12:31 AM, Wu Zhe w...@madk.org wrote:
 Asynchronous daemon doesn't make much sense if page purge occurs on
 server side, but what if we put off page purge to the browser? It works
 like this:

 1. mw parser send request to daemon
 2. daemon finds the work non-trivial, reply *immediately* with a best
   fit or just a place holder
 3. browser renders the page, finds it's not final, so sends a request to
   daemon directly using AJAX
 4. daemon reply to the browser when thumbnail is ready
 5. browser replace temporary best fit / place holder with new thumb
   using Javascript

 Daemon now have to deal with two kinds of clients: mw servers and
 browsers.

 Letting browser wait instead of mw server has the benefit of reduced
 latency for users while still have an acceptable page to read before
 image replacing takes place and a perfect page after that. For most of
 users, it's likely that the replacing occurs as soon as page loading
 ends, since transfering page takes some time, and daemon would have
 already finished thumbnailing in the process.

How long does it take to thumbnail a typical image, though?  Even a
parser cache hit (but Squid miss) will take hundreds of milliseconds
to serve and hundreds of more milliseconds for network latency.  If
we're talking about each image adding 10 ms to the latency, then it's
not worth it to add all this fancy asynchronous stuff.

Moreover, in MediaWiki's case specifically, *very* few requests should
actually require the thumbnailing.  Only the first request for a given
size of a given image should ever require thumbnailing: that can then
be cached more or less forever.  So it's not a good case to optimize
for.  If the architecture should be simplified significantly at the
cost of slight extra latency in 0.01% of requests, I think it's clear
that the simpler architecture is superior.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Google Summer of Code: accepted projects

2009-04-24 Thread Roan Kattouw
2009/4/24 Aryeh Gregor simetrical+wikil...@gmail.com:
 How long does it take to thumbnail a typical image, though?  Even a
 parser cache hit (but Squid miss) will take hundreds of milliseconds
 to serve and hundreds of more milliseconds for network latency.  If
 we're talking about each image adding 10 ms to the latency, then it's
 not worth it to add all this fancy asynchronous stuff.

The problem here seems to be that thumbnail generation times vary a
lot, based on format and size of the original image. It could be 10 ms
for one image and 10 s for another, who knows.

 Moreover, in MediaWiki's case specifically, *very* few requests should
 actually require the thumbnailing.  Only the first request for a given
 size of a given image should ever require thumbnailing: that can then
 be cached more or less forever.
That's true, we're already doing that.

 So it's not a good case to optimize
 for.
AFAICT this isn't about optimization, it's about not bogging down the
Apache that has the misfortune of getting the first request to thumb a
huge image (but having a dedicated server for that instead), and about
not letting the associated user wait for ages. Even worse, requests
that thumb very large images could hit the 30s execution limit and
fail, which means those thumbs will never be generated but every user
requesting it will have a request last for 30s and time out.

Roan Kattouw (Catrope)

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Google Summer of Code: accepted projects

2009-04-24 Thread Chad
All true. The images should not be rethumb'd unless
resolution changes, a new version is uploaded, or the
cache is otherwise purged. However, on initial rendering,
the thumb generation can be a large part (especially if
rendering multiple images) of overall page execution time.
Being able to offload this elsewhere should decrease
that load greatly.

-Chad

On Apr 24, 2009 1:23 PM, Roan Kattouw roan.katt...@gmail.com wrote:

2009/4/24 Aryeh Gregor
simetrical+wikil...@gmail.comsimetrical%2bwikil...@gmail.com
:

 How long does it take to thumbnail a typical image, though?  Even a 
parser cache hit (but Squid ...
The problem here seems to be that thumbnail generation times vary a
lot, based on format and size of the original image. It could be 10 ms
for one image and 10 s for another, who knows.

 Moreover, in MediaWiki's case specifically, *very* few requests should 
actually require the thu...
That's true, we're already doing that.

 So it's not a good case to optimize  for.
AFAICT this isn't about optimization, it's about not bogging down the
Apache that has the misfortune of getting the first request to thumb a
huge image (but having a dedicated server for that instead), and about
not letting the associated user wait for ages. Even worse, requests
that thumb very large images could hit the 30s execution limit and
fail, which means those thumbs will never be generated but every user
requesting it will have a request last for 30s and time out.

Roan Kattouw (Catrope)

___ Wikitech-l mailing list
wikitec...@lists.wikimedia
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Google Summer of Code: accepted projects

2009-04-24 Thread Roan Kattouw
2009/4/24 Chad innocentkil...@gmail.com:
 All true. The images should not be rethumb'd unless
 resolution changes, a new version is uploaded, or the
 cache is otherwise purged.
Repeat: this is what we do already (not sure if that's what you're
trying to say, but should implies differently).

Roan Kattouw (Catrope)

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Google Summer of Code: accepted projects

2009-04-24 Thread Chad
I'm agreeing with you. By should I meant this should
be happening already and issues with this are bugs.

-Chad

On Apr 24, 2009 1:32 PM, Roan Kattouw roan.katt...@gmail.com wrote:

2009/4/24 Chad innocentkil...@gmail.com:

 All true. The images should not be rethumb'd unless  resolution changes,
a new version is uploade...
Repeat: this is what we do already (not sure if that's what you're
trying to say, but should implies differently).

Roan Kattouw (Catrope) ___
Wikitech-l mailing list

Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Google Summer of Code: accepted projects

2009-04-24 Thread Brion Vibber
On 4/24/09 10:32 AM, Roan Kattouw wrote:
 2009/4/24 Chadinnocentkil...@gmail.com:
 All true. The images should not be rethumb'd unless
 resolution changes, a new version is uploaded, or the
 cache is otherwise purged.
 Repeat: this is what we do already (not sure if that's what you're
 trying to say, but should implies differently).

Just to summarize the current state, here's the default MediaWiki 
configuration workflow:

* During page rendering, MediaWiki checks if a thumb of the proper size 
exists.
   * if not, we resize it synchronously on the same server (via GD or 
shell out to ImageMagick etc)
   * an img pointing to the file is added to output
* The web browser loads up the already-rendered image file in the page.


Here's the behavior variant we have on Wikimedia sites:

* During page rendering, we make an img pointing to where the 
thumbnail should be
* The web browser requests the thumbnail image file
   * If it doesn't exist, the upload web server proxies the request [1] 
back to MediaWiki, running on a subcluster which handles only thumbnail 
generation
 * MediaWiki resizes it synchronously via shell out to ImageMagick
   * The web server serves the now-completed file back to the client, 
and it's now on disk for the next request

[1] http://svn.wikimedia.org/viewvc/mediawiki/trunk/tools/upload-scripts/

This prevents slow or broken thumbnailing operations from bogging down 
the *main* web servers, but if it's not reasonably fast we still have 
difficulties:

* No placeholder image -- browser just shows a nice blank box
* Multiple requests will cause multiple attempts to resize at once, 
potentially eating up all CPU time/memory/tmp disk space on the 
thumbnailing cluster

So if we've got, say, a 50 megapixel PNG or TIFF high-res scan, or a 
giant animated GIF which is very expensive to resize, we don't have a 
good way of producing a thumbnail on a good schedule. It'll either time 
out a lot every time it changes, or just never actually complete.

If we have a way to defer things we know will take longer, and show a 
placeholder until it's completed, then we can use those things more 
reliably.


One suggestion that's been brought up for large images is to create a 
smaller version *once at upload time* which can then be used to quickly 
create inline thumbnails of various sizes on demand. But we still need 
some way to manage that asynchronous initial rendering, and have some 
kind of friendly behavior for what to show while it's working.

-- brion

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Google Summer of Code: accepted projects

2009-04-24 Thread Gerard Meijssen
Hoi,
At the moment we have an upper limit of 100Mb. The people who do
restorations have one file that is 680Mb.. The corresponding jpg is also
quite big  !!
Thanks,
   GerardM

2009/4/24 Roan Kattouw roan.katt...@gmail.com

 2009/4/24 Aryeh Gregor 
 simetrical+wikil...@gmail.comsimetrical%2bwikil...@gmail.com
 :
  How long does it take to thumbnail a typical image, though?  Even a
  parser cache hit (but Squid miss) will take hundreds of milliseconds
  to serve and hundreds of more milliseconds for network latency.  If
  we're talking about each image adding 10 ms to the latency, then it's
  not worth it to add all this fancy asynchronous stuff.
 
 The problem here seems to be that thumbnail generation times vary a
 lot, based on format and size of the original image. It could be 10 ms
 for one image and 10 s for another, who knows.

  Moreover, in MediaWiki's case specifically, *very* few requests should
  actually require the thumbnailing.  Only the first request for a given
  size of a given image should ever require thumbnailing: that can then
  be cached more or less forever.
 That's true, we're already doing that.

  So it's not a good case to optimize
  for.
 AFAICT this isn't about optimization, it's about not bogging down the
 Apache that has the misfortune of getting the first request to thumb a
 huge image (but having a dedicated server for that instead), and about
 not letting the associated user wait for ages. Even worse, requests
 that thumb very large images could hit the 30s execution limit and
 fail, which means those thumbs will never be generated but every user
 requesting it will have a request last for 30s and time out.

 Roan Kattouw (Catrope)

 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Google Summer of Code: accepted projects

2009-04-24 Thread Michael Dale
Roan Kattouw wrote:
 The problem here seems to be that thumbnail generation times vary a
 lot, based on format and size of the original image. It could be 10 ms
 for one image and 10 s for another, who knows.

   
yea again if we only issue the big resize operation on initial upload 
with a memory friendly in-place library like vips I think we will be 
oky. Since the user just waited like 10-15 minutes to upload their huge 
image waiting an additional 10-30s at that point for thumbnail and 
instant gratification of seeing your image on the upload page ... is 
not such a big deal.  Then in-page use derivatives could predictably 
resize the 1024x786 ~or so~ image in realtime again instant 
gratification on page preview or page save.

Operationally this could go out to a thumbnail server or be done on the 
apaches if they are small operations it may be easier to keep the 
existing infrastructure than to intelligently handle the edge cases 
outlined. ( many resize request at once, placeholders, image proxy / 
deamon setup)

 AFAICT this isn't about optimization, it's about not bogging down the
 Apache that has the misfortune of getting the first request to thumb a
 huge image (but having a dedicated server for that instead), and about
 not letting the associated user wait for ages. Even worse, requests
 that thumb very large images could hit the 30s execution limit and
 fail, which means those thumbs will never be generated but every user
 requesting it will have a request last for 30s and time out.

   

Again this may be related to the use of unpredictable memory usage of 
image-magic when resizing large images instead of a fast memory confined 
resize engine, no?

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Google Summer of Code: accepted projects

2009-04-24 Thread Gerard Meijssen
Hoi,
The library of Alexandria uses it for the display of their awesome
Napoleontic lithographs.. It would be awesome if we had that code.. It is
actually open source..
Thanks,
 Gerard

2009/4/24 David Gerard dger...@gmail.com

 2009/4/24 Aryeh Gregor 
 simetrical+wikil...@gmail.comsimetrical%2bwikil...@gmail.com
 :

  That's what occurred to me.  In that case, the only possible thing to
  do seems to be to just have the image request wait until the image is
  thumbnailed.  I guess you could show a placeholder image, but that's
  probably *less* friendly to the user, as long as we've specified the
  height and width in the HTML.  The browser should provide some kind of
  placeholder already while the image is loading, after all, and if we
  let the browser provide the placeholder, then at least the image will
  appear automatically when it's done thumbnailing.


 There was a spec in earlier versions of HTML to put a low-res
 thumbnail up while the full image dribbled through your dialup - img
 lowsrc=image-placeholder.gif src=image.gif - but it was so little
 used (I know of no cases) that I don't know if it's even supported in
 browsers any more.

 http://www.htmlcodetutorial.com/images/_IMG_LOWSRC.html


 - d.

 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Google Summer of Code: accepted projects

2009-04-24 Thread Brion Vibber
On 4/24/09 11:05 AM, Michael Dale wrote:
 Roan Kattouw wrote:
 The problem here seems to be that thumbnail generation times vary a
 lot, based on format and size of the original image. It could be 10 ms
 for one image and 10 s for another, who knows.


 yea again if we only issue the big resize operation on initial upload
 with a memory friendly in-place library like vips I think we will be
 oky. Since the user just waited like 10-15 minutes to upload their huge
 image waiting an additional 10-30s at that point for thumbnail and
 instant gratification of seeing your image on the upload page ... is
 not such a big deal.

Well, what about the 5 million other users browsing Special:Newimages? 
We don't want 50 simultaneous attempts to build that first 
über-thumbnail. :)

-- brion

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Google Summer of Code: accepted projects

2009-04-24 Thread lists
 with a memory friendly in-place library like vips I think we will be
 oky. Since the user just waited like 10-15 minutes to upload their huge
 image waiting an additional 10-30s at that point for thumbnail and
 instant gratification of seeing your image on the upload page ... is
 not such a big deal.

 Well, what about the 5 million other users browsing Special:Newimages?
 We don't want 50 simultaneous attempts to build that first
 über-thumbnail. :)

Thumbnail generation could be cascaded, i.e. 120px thumbs could be
generated from the 800px previews instead of the original images.


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Google Summer of Code: accepted projects

2009-04-24 Thread Aryeh Gregor
On Fri, Apr 24, 2009 at 3:58 PM, Brion Vibber br...@wikimedia.org wrote:
 Best to make it explicit rather than presume -- currently we have no
 such locking for slow resizing requests. :)

Yes, definitely.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Google Summer of Code: accepted projects

2009-04-24 Thread Platonides
Michael Dale wrote:
 yea again if we only issue the big resize operation on initial upload 
 with a memory friendly in-place library like vips I think we will be 
 oky. Since the user just waited like 10-15 minutes to upload their huge 
 image waiting an additional 10-30s at that point for thumbnail and 
 instant gratification of seeing your image on the upload page ... is 
 not such a big deal.

It can be parallelized, starting rendering the thumb while the file
hasn't been completely uploaded yet (most formats will allow to do that).
That'd need special software, the easiest would be to use a different
domain on Special:Upload action to the resizing cluster. These changes
are always an annoyance but it would ease many bugs: 10976, 16751,
18202, upload bar, non-NFS backend...


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Google Summer of Code: accepted projects

2009-04-24 Thread Platonides
Also relevant: 17255 and 18201
And as this would be a new upload ssytem, also worth mentioning 18563
(new-upload branch)


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Google Summer of Code: accepted projects

2009-04-24 Thread Brad Jorsch
On Fri, Apr 24, 2009 at 07:08:05PM +0100, David Gerard wrote:
 
 There was a spec in earlier versions of HTML to put a low-res
 thumbnail up while the full image dribbled through your dialup - img
 lowsrc=image-placeholder.gif src=image.gif - but it was so little
 used (I know of no cases) that I don't know if it's even supported in
 browsers any more.
 
 http://www.htmlcodetutorial.com/images/_IMG_LOWSRC.html

I tried it with FireFox 3.0.9 and IE 7.0.6001.18000; neither paid any
attention to it. IE 6.0.2800.1106 under Wine also ignored it. Too bad,
that would have been nice if it worked.

I don't know that we need fancy AJAX if we know at page rendering time
whether the image is available, though. We might be able to get away
with a simple script like this:
  var ImageCache={};
  function loadImage(id, url){
  var i = document.getElementById(id);
  if(i){
  var img = new Image();
  ImageCache[id] = img;
  img.onload=function(){ i.src = url; ImageCache[id]=null; };
  img.src = url;
  }
  }
And then generate the img tag with the placeholder and some id, and
call that function onload for it. Of course, if there are a lot of these
images on one page then we might run into the browser's concurrent
connection limit, which an AJAX solution might be able to overcome.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Google Summer of Code: accepted projects

2009-04-23 Thread Wu Zhe
Michael Dale md...@wikimedia.org writes:

 I recommended that the image daemon run semi-synchronously since the
 changes needed to maintain multiple states and return non-cached
 place-holder images while managing updates and page purges for when the
 updated images are available within the wikimedia server architecture
 probably won't be completed in the summer of code time-line. But if the
 student is up for it the concept would be useful for other components
 like video transformation / transcoding, sequence flattening etc. But
 its not what I would recommend for the summer of code time-line.

I may have problems understanding the concept semi-synchronously, does
it mean when MW parse a page that contains thumbnail images, the parser
sends requests to daemon which would reply twice for each request, one
immediately with a best fit or a place holder (synchronously), one later
on when thumbnail is ready (asynchronously)?

 == what would probably be better for the image resize efforts should
 focus on ===

 (1) making the existing system more robust and (2) better taking
 advantage of multi-threaded servers.

 (1) right now the system chokes on large images we should deploy
 support for an in-place image resize maybe something like vips (?)
 (http://www.vips.ecs.soton.ac.uk/index.php?title=Speed_and_Memory_Use)
 The system should intelligently call vips to transform the image to a
 reasonable size at time of upload then use those derivative for just
 in time thumbs for articles. ( If vips is unavailable we don't
 transform and we don't crash the apache node.)

Wow, vips sounds great, still reading its documentation. How is its
performance on relatively small size (not huge, a few hundreds pixels in
width/height) images compared with traditional single threaded resizing
programs?

 (2) maybe spinning out the image transform process early on in the
 parsing of the page with a place-holder and callback so by the time
 all the templates and links have been looked up the image is ready for
 output. (maybe another function wfShellBackgroundExec($cmd,
 $callback_function) (maybe using |pcntl_fork then normal |wfShellExec
 then| ||pcntl_waitpid then callback function ... which sets some var
 in the parent process so that pageOutput knows its good to go) |

Asynchronous daemon doesn't make much sense if page purge occurs on
server side, but what if we put off page purge to the browser? It works
like this:

1. mw parser send request to daemon
2. daemon finds the work non-trivial, reply *immediately* with a best
   fit or just a place holder
3. browser renders the page, finds it's not final, so sends a request to
   daemon directly using AJAX
4. daemon reply to the browser when thumbnail is ready
5. browser replace temporary best fit / place holder with new thumb
   using Javascript

Daemon now have to deal with two kinds of clients: mw servers and
browsers.

Letting browser wait instead of mw server has the benefit of reduced
latency for users while still have an acceptable page to read before
image replacing takes place and a perfect page after that. For most of
users, it's likely that the replacing occurs as soon as page loading
ends, since transfering page takes some time, and daemon would have
already finished thumbnailing in the process.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Google Summer of Code: accepted projects

2009-04-23 Thread Wu Zhe
Michael Dale md...@wikimedia.org writes:

 I recommended that the image daemon run semi-synchronously since the
 changes needed to maintain multiple states and return non-cached
 place-holder images while managing updates and page purges for when the
 updated images are available within the wikimedia server architecture
 probably won't be completed in the summer of code time-line. But if the
 student is up for it the concept would be useful for other components
 like video transformation / transcoding, sequence flattening etc. But
 its not what I would recommend for the summer of code time-line.

I may have problems understanding the concept semi-synchronously, does
it mean when MW parse a page that contains thumbnail images, the parser
sends requests to daemon which would reply twice for each request, one
immediately with a best fit or a place holder (synchronously), one later
on when thumbnail is ready (asynchronously)?

 == what would probably be better for the image resize efforts should
 focus on ===

 (1) making the existing system more robust and (2) better taking
 advantage of multi-threaded servers.

 (1) right now the system chokes on large images we should deploy
 support for an in-place image resize maybe something like vips (?)
 (http://www.vips.ecs.soton.ac.uk/index.php?title=Speed_and_Memory_Use)
 The system should intelligently call vips to transform the image to a
 reasonable size at time of upload then use those derivative for just
 in time thumbs for articles. ( If vips is unavailable we don't
 transform and we don't crash the apache node.)

Wow, vips sounds great, still reading its documentation. How is its
performance on relatively small size (not huge, a few hundreds pixels in
width/height) images compared with traditional single threaded resizing
programs?

 (2) maybe spinning out the image transform process early on in the
 parsing of the page with a place-holder and callback so by the time
 all the templates and links have been looked up the image is ready for
 output. (maybe another function wfShellBackgroundExec($cmd,
 $callback_function) (maybe using |pcntl_fork then normal |wfShellExec
 then| ||pcntl_waitpid then callback function ... which sets some var
 in the parent process so that pageOutput knows its good to go) |

Asynchronous daemon doesn't make much sense if page purge occurs on
server side, but what if we put off page purge to the browser? It works
like this:

1. mw parser send request to daemon
2. daemon finds the work non-trivial, reply *immediately* with a best
   fit or just a place holder
3. browser renders the page, finds it's not final, so sends a request to
   daemon directly using AJAX
4. daemon reply to the browser when thumbnail is ready
5. browser replace temporary best fit / place holder with new thumb
   using Javascript

The daemon now have to deal with two kinds of clients: mw servers and
browsers.

Letting browser wait instead of mw server has the benefit of reduced
latency for users while still have an acceptable page to read before
image replacing takes place and a perfect page after that. For most of
users, it's likely that the replacing occurs as soon as page loading
ends, since transfering page takes some time, and daemon would have
already finished thumbnailing in the process.



___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Google Summer of Code: accepted projects

2009-04-23 Thread Wu Zhe
Michael Dale md...@wikimedia.org writes:

 I recommended that the image daemon run semi-synchronously since the
 changes needed to maintain multiple states and return non-cached
 place-holder images while managing updates and page purges for when the
 updated images are available within the wikimedia server architecture
 probably won't be completed in the summer of code time-line. But if the
 student is up for it the concept would be useful for other components
 like video transformation / transcoding, sequence flattening etc. But
 its not what I would recommend for the summer of code time-line.

I may have problems understanding the concept semi-synchronously, does
it mean when MW parse a page that contains thumbnail images, the parser
sends requests to daemon which would reply twice for each request, one
immediately with a best fit or a place holder (synchronously), one later
on when thumbnail is ready (asynchronously)?

 == what would probably be better for the image resize efforts should
 focus on ===

 (1) making the existing system more robust and (2) better taking
 advantage of multi-threaded servers.

 (1) right now the system chokes on large images we should deploy
 support for an in-place image resize maybe something like vips (?)
 (http://www.vips.ecs.soton.ac.uk/index.php?title=Speed_and_Memory_Use)
 The system should intelligently call vips to transform the image to a
 reasonable size at time of upload then use those derivative for just
 in time thumbs for articles. ( If vips is unavailable we don't
 transform and we don't crash the apache node.)

Wow, vips sounds great, still reading its documentation. How is its
performance on relatively small size (not huge, a few hundreds pixels in
width/height) images compared with traditional single threaded resizing
programs?

 (2) maybe spinning out the image transform process early on in the
 parsing of the page with a place-holder and callback so by the time
 all the templates and links have been looked up the image is ready for
 output. (maybe another function wfShellBackgroundExec($cmd,
 $callback_function) (maybe using |pcntl_fork then normal |wfShellExec
 then| ||pcntl_waitpid then callback function ... which sets some var
 in the parent process so that pageOutput knows its good to go) |

Asynchronous daemon doesn't make much sense if page purge occurs on
server side, but what if we put off page purge to the browser? It works
like this:

1. mw parser send request to daemon
2. daemon finds the work non-trivial, reply *immediately* with a best
   fit or just a place holder
3. browser renders the page, finds it's not final, so sends a request to
   daemon directly using AJAX
4. daemon reply to the browser when thumbnail is ready
5. browser replace temporary best fit / place holder with new thumb
   using Javascript

The daemon now have to deal with two kinds of clients: mw servers and
browsers.

Letting browser wait instead of mw server has the benefit of reduced
latency for users while still have an acceptable page to read before
image replacing takes place and a perfect page after that. For most of
users, it's likely that the replacing occurs as soon as page loading
ends, since transfering page takes some time, and daemon would have
already finished thumbnailing in the process.


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Google Summer of Code: accepted projects

2009-04-22 Thread Michael Dale
Aryeh Gregor wrote:
 I'm not clear on why we don't just make the daemon synchronously
 return a result the way ImageMagick effectively does.  Given the level
 of reuse of thumbnails, it seems unlikely that the latency is a
 significant concern -- virtually no requests will ever actually wait
 on it.
   
( I basically outlined these issues on the soc page but here they are 
again with at bit more clarity )

I recommended that the image daemon run semi-synchronously since the 
changes needed to maintain multiple states and return non-cached 
place-holder images while managing updates and page purges for when the 
updated images are available within the wikimedia server architecture 
probably won't be completed in the summer of code time-line. But if the 
student is up for it the concept would be useful for other components 
like video transformation / transcoding, sequence flattening etc. But 
its not what I would recommend for the summer of code time-line.

== per issues outlined in bug 4854 ==
I don't think its a good idea to invest a lot of energy into a separate 
python based image daemon. It won't avoid all  problems listed in bug 4854

Shell-character-exploit issues should be checked against anyway (since 
not everyone is going to install the daemon)

Other people using mediaWiki won't add a python or java based image 
resize and resolve dependency python or java  component  libraries. It 
won't be easier to install than imagemagick or php-gd that are 
repository hosted applications and already present in shared hosting 
environments.

Once you start integrating other libs like (java) Batik it becomes 
difficult to resolve dependencies (java, python etc) and to install you 
have to push out a new program that is not integrated into all the 
application repository manages for the various distributions. 

Potential to isolate CPU and memory usage should be considered in the 
core medaiWiki image resize support anyway . ie we don't want to crash 
other peoples servers who are using mediaWiki by not checking upper 
bounds of image transforms. Instead we should make the core image 
transform smarter maybe have a configuration var that /attempts/ to bind 
the upper memory for spawned processing and take that into account 
before issuing the shell command for a given large image transformation 
with a given sell application.

== what would probably be better for the image resize efforts should 
focus on ===

(1) making the existing system more robust and (2) better taking 
advantage of multi-threaded servers.

(1) right now the system chokes on large images we should deploy support 
for an in-place image resize maybe something like vips (?) 
(http://www.vips.ecs.soton.ac.uk/index.php?title=Speed_and_Memory_Use) 
The system should intelligently call vips to transform the image to a 
reasonable size at time of upload then use those derivative for just in 
time thumbs for articles. ( If vips is unavailable we don't transform 
and we don't crash the apache node.)

(2) maybe spinning out the image transform process early on in the 
parsing of the page with a place-holder and callback so by the time all 
the templates and links have been looked up the image is ready for 
output. (maybe another function wfShellBackgroundExec($cmd, 
$callback_function) (maybe using |pcntl_fork then normal |wfShellExec 
then| ||pcntl_waitpid then callback function ... which sets some var in 
the parent process so that pageOutput knows its good to go) |

If operationally the daemon should be on a separate server we should 
still more or less run synchronously ... as mentioned above ... if 
possible the daemon should be php based so we don't explode the 
dependencies for deploying robust image handling with mediaWiki.

peace,
--michael

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Google Summer of Code: accepted projects

2009-04-22 Thread Brion Vibber
Thanks for taking care of the announce mail, Roan! I spent all day 
yesterday at the dentists... whee :P

I've taken the liberty of reposting it on the tech blog: 
http://techblog.wikimedia.org/2009/04/google-summer-of-code-student-projects-accepted/

I'd love for us to get the students set up on the blog to keep track of 
their project progress and raise visibility... :D

-- brion

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Google Summer of Code: accepted projects

2009-04-22 Thread Magnus Manske
On Wed, Apr 22, 2009 at 12:54 AM, Marco Schuster
ma...@harddisk.is-a-geek.org wrote:
 On Wed, Apr 22, 2009 at 12:22 AM, Roan Kattouw roan.katt...@gmail.comwrote:

 * Zhe Wu, mentored by Aryeh Gregor (Simetrical), will be building a
 thumbnailing daemon, so image manipulation won't have to happen on the
 Apache servers any more


 Wow, I'm lookin' forward to this. Mighta be worth a try to give the upper
 the ability to choose non-standard resizing filters or so... or full-fledged
 image manipulation, something like a wiki-style photoshop.

On a semi-related note: What's the status of the management routines
that handle thrwoaway things like math PNGs?
Is this a generic system, so it can be used e.g. for jmol PNGs in the future?
Is it integrated with the image thumbnail handling?
Should it be?

Magnus

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


[Wikitech-l] Google Summer of Code: accepted projects

2009-04-21 Thread Roan Kattouw
Yesterday, the selection of GSoC projects was officially announced.
For MediaWiki, the following projects have been accepted:

* Niklas Laxström (Nikerabbit), mentored by Siebrand, will be working
on improving localization and internationalization in MediaWiki, as
well as improving the Translate extension used on translatewiki.net
* Zhe Wu, mentored by Aryeh Gregor (Simetrical), will be building a
thumbnailing daemon, so image manipulation won't have to happen on the
Apache servers any more
* Jeroen de Dauw, mentored by Yaron Koren, will be improving the
Semantic Layers extension and merging it into the Semantic Google Maps
extension
* Gerardo Antonio Cabero, mentored by Michael Dale (mdale), will be
improving the Cortado applet for video playback (I'm a bit fuzzy on
the details for this one)

The official list with links to (parts of) the proposals can be found
at the Google website [1]; lists for other organizations can be
reached through the list of participating organizations [2].

The next event on the GSoC timeline [3] is the community bonding
period [4], during which the students are supposed to get to know
their mentors and the community. This period lasts until May 23rd,
when the students actually begin coding.

Starting now and continuing at least until the end of GSoC in August,
you will probably see and hear from the students on IRC and the
mailing lists and hear about the projects they're working on. To
repeat the crux of an earlier thread on this list [5]: be nice to
these special newcomers, make them feel welcome and comfortable, and
try not to bite them :)

To the mentors and students: have fun!

Roan Kattouw (Catrope)

[1] http://socghop.appspot.com/org/home/google/gsoc2009/wikimedia
[2] http://socghop.appspot.com/program/accepted_orgs/google/gsoc2009
[3] http://socghop.appspot.com/document/show/program/google/gsoc2009/timeline
[4] 
http://googlesummerofcode.blogspot.com/2007/04/so-what-is-this-community-bonding-all.html
[5] http://lists.wikimedia.org/pipermail/wikitech-l/2009-March/041964.html

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Google Summer of Code: accepted projects

2009-04-21 Thread David Gerard
2009/4/22 Michael Dale md...@wikimedia.org:
 Marco Schuster wrote:
 On Wed, Apr 22, 2009 at 12:22 AM, Roan Kattouw roan.katt...@gmail.comwrote:

 * Zhe Wu, mentored by Aryeh Gregor (Simetrical), will be building a
 thumbnailing daemon, so image manipulation won't have to happen on the
 Apache servers any more

 Wow, I'm lookin' forward to this. Mighta be worth a try to give the upper
 the ability to choose non-standard resizing filters or so... or full-fledged
 image manipulation, something like a wiki-style photoshop.

 I was looking at http://editor.pixastic.com/ ... wiki-style photoshop
 would be cool ... but not in the scope of that soc project ;)


You can do pretty much anything with ImageMagick. Trouble is that it's
not the fastest at *anything*. Depends how much that affects
performance in practice - something that *just* thumbnails could be
all sorts of more efficient, but you'd need a new program for each
function, and most Unix users of MediaWiki thumbnail with ImageMagick
already so it'll be there.


- d.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Google Summer of Code: accepted projects

2009-04-21 Thread Chad
On Tue, Apr 21, 2009 at 8:16 PM, David Gerard dger...@gmail.com wrote:
 2009/4/22 Michael Dale md...@wikimedia.org:
 Marco Schuster wrote:
 On Wed, Apr 22, 2009 at 12:22 AM, Roan Kattouw 
 roan.katt...@gmail.comwrote:

 * Zhe Wu, mentored by Aryeh Gregor (Simetrical), will be building a
 thumbnailing daemon, so image manipulation won't have to happen on the
 Apache servers any more

 Wow, I'm lookin' forward to this. Mighta be worth a try to give the upper
 the ability to choose non-standard resizing filters or so... or full-fledged
 image manipulation, something like a wiki-style photoshop.

 I was looking at http://editor.pixastic.com/ ... wiki-style photoshop
 would be cool ... but not in the scope of that soc project ;)


 You can do pretty much anything with ImageMagick. Trouble is that it's
 not the fastest at *anything*. Depends how much that affects
 performance in practice - something that *just* thumbnails could be
 all sorts of more efficient, but you'd need a new program for each
 function, and most Unix users of MediaWiki thumbnail with ImageMagick
 already so it'll be there.


 - d.

 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l


The main issue with the daemon idea (which was discussed at length in
#mediawiki a few weeks ago) is that it requires a major change in how we
handle images.

Right now, the process involves rendering on-demand, rather than at-leisure.
This has the benefit of always producing an ideal thumb'd image at the end
of every parse. However the major drawbacks are an increase in parsing
time (while we wait for ImageMagik to do its thing) and an increased load on
the app servers. The only time we can sidestep this is if someone uses a
thumb dimension for which we already have a thumb rendered.

In order for this to work, we'd need to shift to a style of render when you get
a chance, but give me the best fit for now. Basically, we'd begin parsing and
find that we need a thumbnailed copy of some image, but we don't have the
ideal size just yet. Instead, we could return the best-fitting thumbnail so far
and use that until the daemon has given us the right image.

Not an easy task, but I certainly hope some progress can be made on
it over the summer :)

-Chad

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Google Summer of Code: accepted projects

2009-04-21 Thread Aryeh Gregor
On Tue, Apr 21, 2009 at 7:54 PM, Marco Schuster
ma...@harddisk.is-a-geek.org wrote:
 Wow, I'm lookin' forward to this. Mighta be worth a try to give the upper
 the ability to choose non-standard resizing filters or so... or full-fledged
 image manipulation, something like a wiki-style photoshop.

That seems to be orthogonal to the proposed project.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Google Summer of Code: accepted projects

2009-04-21 Thread Aryeh Gregor
On Tue, Apr 21, 2009 at 8:34 PM, Chad innocentkil...@gmail.com wrote:
 The main issue with the daemon idea (which was discussed at length in
 #mediawiki a few weeks ago) is that it requires a major change in how we
 handle images.

 Right now, the process involves rendering on-demand, rather than at-leisure.
 This has the benefit of always producing an ideal thumb'd image at the end
 of every parse. However the major drawbacks are an increase in parsing
 time (while we wait for ImageMagik to do its thing) and an increased load on
 the app servers. The only time we can sidestep this is if someone uses a
 thumb dimension for which we already have a thumb rendered.

 In order for this to work, we'd need to shift to a style of render when you 
 get
 a chance, but give me the best fit for now. Basically, we'd begin parsing and
 find that we need a thumbnailed copy of some image, but we don't have the
 ideal size just yet. Instead, we could return the best-fitting thumbnail so 
 far
 and use that until the daemon has given us the right image.

I'm not clear on why we don't just make the daemon synchronously
return a result the way ImageMagick effectively does.  Given the level
of reuse of thumbnails, it seems unlikely that the latency is a
significant concern -- virtually no requests will ever actually wait
on it.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l