On Tue, 20 Dec 2005 17:39:43 -0600 Nathan Ingersoll <[EMAIL PROTECTED]>
babbled:

> Hello all, sorry for the long message, but I want to get everyone updated
> with the necessary background before I send my proposed change.
> 
> We've long had a trend of duplicate code in CVS, and right now this is
> definitely the case. There is one area where I think there is enough
> concensus that we could unify our approaches and cut down some of the
> duplication, and that's thumbnail generation. It seems every version of
> thumbnailing uses Epsilon for this except for EFM (because of the "no
> further dependancies" rule). The point where implementations differ is the
> method used for generating the thumbnails in the background. There are
> presently three approaches: 1. block and generate 2. fork for each thumbnail
> 3. worker thread.
> 
> The first approach is simply calling epsilon in the code directly and
> waiting for the thumbnail to complete. This blocks the UI and can cause
> usability issues if a large number of images are processed sequentially,
> and/or the files are very large. On the plus side, it's very simple and easy
> to code.
> Example: emblem
> 
> The next method forks a child process when an image is identified, and that
> child process does the work of calling entropy. When the child exits the
> parent can identify the child and determine which thumbnail updated. Forking
> for each thumbnail greatly improves the situation as the UI can continue to
> process and simply update its image when the thumbnail is ready. It also has
> the advantage that thumbnailing is accomplished in parallel, while one image
> is blocking on a read, the previous one can be using the CPU and this scales
> out to the number of CPU's on the system. The potential problem with this
> approach is essentially a fork/swap bomb. When there are a large number of
> images in a directory, you get a process for each image, followed by an
> immediate attempt to read all of the image data to memory. Process creation
> is pretty light on Linux, but not on all POSIX compliant systems and there
> are often requirements on systems to limit the number of simultaneous
> processes per-user. The default on OS X 10.3 is 100 processes for a single
> user, thats quite easy to hit if we are forking for a few hundred
> thumbnails, and the situation gets worse on multi-user systems (think LTSP
> or SunRay's). We need to handle the case when fork starts to fail.
> Examples: exhibit, EFM
> 
> The worker thread approach creates a series of worker threads to thumbnail a
> specific number of thumbnails simultaneously. It has the advantage of the
> fork method, where it doesn't block the UI while thumbnailing. While
> allowing some work in parallel, it does not overrun the system by generating
> all thumbnails simultaneously because there are a limited number of worker
> threads that are dispatched from a queue of images to thumbnail. The
> downsides are that it requires more complicated communication mechanisms
> (and locking) to notify the process when the thumbnails are complete and
> brings in a pthreads dependancy.
> Example: entropy
> 
> There is also a race condition performance penalty that applies to all cases
> as well. If you have two processes generating thumbnails for the same large
> directory, its entirely possible to reach a state where both processes are
> generating thumbnails for the same images simultaneously. Eg. I open entropy
> in my picture directory with 1000 images (just throwing out a convenient
> number), and I decide to open that directory in exhibit to import some pics.
> Suppose entropy has processed the first 100 images, and exhibit benefits
> from all of the previously cached thumbnails so it quickly catches up to
> entropy, say around image 120. Exhibit sees that no thumbnail exists for
> image 120 and begins thumbnailing it, meanwhile entropy is just finishing
> its thumbnail, writes it to disk and moves on to image 121. Exhibit
> completes its thumbnail of image 120, overwrites the thumbnail entropy
> wrote, and moves on to image 121. In the worst case, this write-overwrite
> effect could continue for the remaining 878 images. This is a bit of a
> simplification because neither forking or worker threads linearly thumbnail
> like this, but scheduling could result in this type of situation. We've now
> done twice as much work as necessary, and caused a large spike in IO.
> 
> So that's the current state of things, I have a couple of ideas in mind that
> I'll outline in a follow up message as this one is overly long already.

ok. good summary. efm is going to have to sit ont he sideliens for e17 -
basically as you said - no more deps. but i do see a good point for a generic
thumbnailer api that solves above issues - but doesnt limit things either or
shoehorn via a small gateway too much.

just one thing - with efm it shouldnt be forking 1 process per image all at
once. it will only be keeping 1 forked child at a time - running along
generating for images without thumbs if they need one. the parent just gets the
child exit event then forks off another. so it's 1 fork per image - and only
per image that needs a thumb ANd only once per generation. i think you can
safely asume even on the worst of posi systems 1 fork is nothing compared to
the workload of loading, scaling then writing an image file :)

anyway. i personalyl still favor the fork model as it requires nop pthreads,
likely is no overhead compared to threads, has no concurrency and cache issues,
aned is simple. what it does ned is an ability to tune how many forked image
generators to allow at a time (efm allows only 1 so dual cpu systems will be
happy, more cpus wont benefit - ok maybe 3 as x is probably involved, and you
might say 4 if you let the kernel run IO cpu instructions on a 4th cpu). anyway
- you DO have a very valid point for when 2 apps start thumbnailign the same
dir. we should definitely put in a locking mechanism for that so wither they
share the workload, or the first guy in gets "lock ownership" and drives the
thumbnailing until he's done and the other process sits and waits (maybe
polling the lock file if we sue that mechanism - the owenr coudl update the
timespamp on the lockfile whenever it generates something. or the lock file
could contain info as so the queue of ungenerated images to go... or maybe the
simplest case all processes not owning the thumbnailing for that dir hold off
until the owner releases (hopefulyl not too long from now) and then do a full
update).

anyway - i do agree that there is need to unify. i do also think there are 2
levels here. 1. just generate thumbs and let calling process know (either via a
blocking api or a fork/event), 2. be able to ask for the thumb path for any
given file path, and 3. load thumb into a canvas object (another level
entirely). i do think you want to support blocking and sync - both. really
async can just be a wrapper on top of the blocking api.

imho it could do with:

1. add a file path to the thumb gen queue
2. delete a file path from the queue
3. begin queue processing
4. pause/unpause queue processing
5. end queue processing
6. ask for thumb path from file path
7. brute-force blocking-api generate thumb
8. get "new thumb available" events
9. set paralellism count (how many threads or forked children to allow at a
time)

locking can be implemented under the bonnet of such and api. also one thing i
think might be good here is that the lib actually dynamically adapts to
whatever libs it can find RUNTIME - not compile-time. so if it finds imlib2, it
will sue it. if it finds evas, it will use that, if it finds epeg, it will use
that - it can dlopen the libs just like the runtime linker, and thus adapt to
whatever is on the system runtime without compile time dependencies (installing
more libs just gets faster thumbnailing or more format support etc.). this
allows us to add other things in future under the hood (thumbnailing pdf's,
html files, text files, svg, etc.)


-- 
------------- Codito, ergo sum - "I code, therefore I am" --------------
The Rasterman (Carsten Haitzler)    [EMAIL PROTECTED]
裸好多
Tokyo, Japan (東京 日本)


-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click
_______________________________________________
enlightenment-devel mailing list
enlightenment-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/enlightenment-devel

Reply via email to