Richard Clarke wrote:
> List,
>     I find the following situation one which arises many times for me when
> creating modperl applications for people. However, I always find myself
> thinking there is a better way to do it. I wondered if the list would like
> to share their thoughts on the best way. It concerns storing and serving
> images/media uploaded by users of the webpage.
> 
> An example could be a website letting you set up your own shops to sell
> products. The shop maker may allow you to upload preview images of products.
> Assuming the product data is stored in a database, I personally wouldnt
> store the binary image in the databas (assuming mysql here). A solution
> springing to mind is to store a hash/id in the database and have a common
> directory (/htdocs/_previews/) which holds the pictures named after that
> hash/id. That way, either the modperl application can auto create the link
> using src=/htdocs/_previews/imageid.jpg or a lightweight handler can be
> used. For example /getimage?id=asdf09sd8fsa could then rewrite the uri to
> the real location or perform a content subrequest and let apache serve the
> image that way. Of course there are many solutions, but I'm wondering. Is
> there a best one?
> 
> Any thoughts appreciated. I realise that the same situation might occur
> using vanilla cgi, but mod_perl provides unique ways of solving the problem,
> hence I post to this list.

I doubt this has anything to do specifically with mod_perl , since you 
are talking about storage/retrieval techniques, it'll work the same for 
any other technology out there. Though it's an interesting topic.

A *good* filesystem can serve well as database, though you should be 
aware of the issue with how many files you store in each directory: the 
more files you put into one directory the slower the access time. Modern 
filesystems (definitely don't use FAT based fs) implement internal 
hashing of file names, but you've to check the filesystem that you use 
for its limits. The retrieval speed significantly slows down as the 
search becomes linear after you pass that limit. In that case you should 
do your own hashing. so you map a filename 'abcdef.gif' into 
a/b/c/abcdef.gif (3 levels)
Again how many level of hashing to use depends on how many files you 
plan to store and how many files you can put into each directory so the 
filesystem will not go to the linear lookup. Too many levels is not good 
since each extar sub-dir slows things down. Once you have the numbers 
it's easy to calculate how much levels to use

Make your code transparent to the hashing function, so in the future you 
can easily scale and move into extra levels of hashing. Of course if you 
can benchmark things comparing the RDBMS' BLOB objects retrieval speed 
with filesystem fetch that will help to make the decision.

It also depends on the caching patterns: if you have certain images 
being fetched frequently the kernel/filesystem will do the caching for 
you. Of course you can do extra caching by yourself 
(squid/mod_proxy/etc) but if you can get it for free from the os level 
it could be even better.

Check also Perrin's article, but if I remember correctly it doesn't talk 
about this issue.
http://perl.apache.org/release/docs/tutorials/apps/scale_etoys/etoys.html

p.s. to hash (3 levels) you can use something like:

% perl -le '$a = "super_pc.gif"; print join "/", (split //, $a, 
4)[0..2], $a'
s/u/p/super_pc.gif

of course you can use a more effective hashing.

__________________________________________________________________
Stas Bekman            JAm_pH ------> Just Another mod_perl Hacker
http://stason.org/     mod_perl Guide ---> http://perl.apache.org
mailto:[EMAIL PROTECTED] http://use.perl.org http://apacheweek.com
http://modperlbook.org http://apache.org   http://ticketmaster.com

Reply via email to