Janne Jalkanen wrote:
Congratulations on the new site!
Thanks!
a. documentation on the outer limits of Priha's abilities,
particularly throughput performance on extremely large
numbers of very small XML files (potentially a scale of
100m records with high speed indexing via XML ID).
Would be interesting to see how fast it becomes... At the moment the
speed isn't that bad, and in theory reading should be O(1) [though
creation on the FileProvider is O(N) or worse]. But it depends a lot on
the DB you have under. There's currently a HSQLDB provider, but
tweaking it to run on MySQL should be pretty easy too.
The web service I'm designing will use caching for the high-performance
turnaround we need, so I suppose we don't need the kind of throughput
of an enterprise service for day-to-day operations, but it is expected
that batch loads of records (or XSLT-based modifications of existing
records) should run in a reasonable amount of time for very large
numbers of records (1-10 million). Though it's not expected that these
kinds of changes will be frequent they will likely (at this point) have
to occur during down times. Running on MySQL *might* be a problem for
us since it's not currently an endorsed DB provider (despite the site
being a Sun Microsystems Centre of Excellence and MySQL being owned by
Sun). But we're working on that...
b. development of a JSR-170 sub-interface so that we have
sense of what Priha implements.
Mmm... You could just run the JCR TCK and get it from there.
On the question of developing a sub-interface of JCR-170 as a
"guarantee of service", is that a possibility? While designed
as according to JSR-170 Priha is currently non-conformant (so
far as I understand) with the API, so wouldn't it be prudent
to create a sub-interface? Or am I not understanding the
situation? (possible, I've not had much time to get my head
around this yet)
This in addition to my ongoing interest in Priha as a JSPWiki
backend, where I'm still hoping to see support for pluggable
metadata, since installations often have their own metadata
requirements beyond the rudimentary stuff required by the
wiki itself, such as needing to integrate into existing
enterprise architectures.
The way I've currently written it, a plugin has full access to the
entire JCR metadata - we just provide accessors for some most commonly
used metadata (like the content of the page as a String; author of the
page; version of the page).
Yes, that's what I'm looking forward to...
I would also take a look at Jackrabbit. It has some known scalability
limits (if you put too many children in a single Node), but at least
it's well tested (which Priha isn't right now).
The current design could take a couple of approaches. One is a structure
like:
root node
|
_____ session node
|
__ document node
__ document node
__ document node
...
where a large number of nodes are loaded via a session and are meant
to be considered as children (so that their session metadata can be
found by going up to the session node rather than being stored redundantly
in each node.
This would suggest that a large number of document nodes (e.g., 500,000),
might be a problem, i.e., if we didn't store them flat. Another possibility
is to
root node
|
__ session node
__ document node
__ document node
__ document node
...
and then have links from each document node into its corresponding
session node. While this does avoid redundantly storing session-level
metadata at the node level, it (a) means we must store a session link
for each document, and (b) introduces potential synchonisation issues.
One of the things about JackRabbit is that currently it is a bear to
install. Simplicity is not part of their design goals, apparently.
Thanks,
Murray
...........................................................................
Murray Altheim <murray08 at altheim dot com> === = =
http://www.altheim.com/murray/ = = ===
SGML Grease Monkey, Banjo Player, Wantanabe Zen Monk = = = =
Boundless wind and moon - the eye within eyes,
Inexhaustible heaven and earth - the light beyond light,
The willow dark, the flower bright - ten thousand houses,
Knock at any door - there's one who will respond.
-- The Blue Cliff Record