Re: Simplicity

Steve Willer Mon, 04 Feb 2002 13:50:39 -0800

On Mon, 2002-02-04 at 15:51, Melvyn Sopacua wrote:
> > The second argument in the cache-load and cache-save parameters is the
> > Big Ugly String to be used for the cache key. You can just MD5 it or
> > whatever to build a more or less unique key. The work involved in this
> > is one regular expression to convert words in parens into session
> > values, an extra bit to build a cachekey, and a stat() call.
> 
> So you would stick the cache in one dir, for all stages. With over 10.000
> articles, of which at any point in time, 10% is considered current (don't
> you just hate, 'relevant articles'), this would generate a cluttered dir
> in no time.


Oh, no, you wouldn't use one directory. You'd use a tree of dirs. Maybe
you take the first two chars and use that for the first-level dir, then
the next four chars and use that for the second-level dir. Squid (which
uses one file per cached entry, plus a global directory file) does this,
and so does AxKit now. You might have directories like this for a
cachekey of AABBCCDD:

  cache/AA/BBCC/DD

As long as you keep directory sizes below a few thousand, you're okay. 

> Yes, but we can't. We use the timestamp, so you could cache 1 second,
> but ideally we should use a true random number.

I see. Hm. Yeah, this is the kind of problem I'm trying to solve. You
really need caching of subrequest responses (or a post-cached result
transmogrifier, which I gather AxKit has).

Sticking Squid in front has other benefits, by the way, even if you
don't cache the pages. You can cache images, obviously, which is
something. But also very valuable is the fact that Squid will take, say,
a thousand simultaneous clients of varying speeds and funnel them into
maybe 10 keep-alive connections to the backend. It's very good at
keeping the number of mod_perl processes at a minimum, and also it makes
you no longer vulnerable to people connecting to your system on 9600bps
modems and downloading large files, taking up a mod_perl process as they
download.

> > which would turn into an XML
> > container tag with a bunch of child tags only one level deep.
> 
> Relating to, userdata.

That's an SQL join. Which is something XML systems don't have naturally,
btw.

> Ideally - a database, would be 1 large, compressed XML document, and the
> database engine, would be aware of values, vs elements, etc.

Yeah I've often pondered that, wondering if it would work. Representing
The World as one large XML document is a nice ideal if anyone ever
eliminates the performance issue. But someone would also have to add to
the toolset of a basic XML document. Being able to join data into
different trees like a symlink (maybe xinclude statements?) would be
vital. So would be the ability to join two XML trees that aren't really
related on an ad-hoc basis (the equivalent of SQL joins). XML systems
seem to be really lacking when you get into expressing relationships
between data. But then, I'm not an SGML or XML guru so maybe I just
don't know about an existing system.

It's an interesting topic. I don't expect to make use of XML databases
any time in the next few years, but I do keep an eye on the state of the
art.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Simplicity

Reply via email to