* Stefano Mazzocchi <[EMAIL PROTECTED]> [2004-02-17 04:22]:
> Alan wrote:
>
> >Momento is a native XML persistence engine. Is supports XSLT 2.0 and
> > XQuery 1.0 (via Saxon), it supports XUpdate.
> >
> > It is transactional and ACID.
> >
> > It is designed with Cocoon in mind.
> >
> > I am considering a open source path for future development.
> >
> > http://engrm.com/project/com.agtrz.momento/
> >
> >Thoughts?
>
> Very interesting. Can you tell us more on how it works? have any numbers
> on how fast/scalable it gets? what's the difference between Momento and
> a native xml database like XIndice?
Stefano
Thanks for asking.
3) I can't compare Momento to Xindice at the moment. Last I looked
at Xindice was last November. I'm announcing to get such insight.
(I am willing to say that Momento isn't wedded to a specific
API, such as XML::DB. It works with Saxon to provide
XQuery and XSLT. I'm implementing XUpdate currently. A
read-only W3 DOM is a simple matter, if there is call
for it. A read-write W3 DOM is not so simple, or
desireable, but entirely feasable.
I could be way off base. Let me know.
)
2) No numbers. I've not designed any benchmarks. Momento is to the
point where I need an example application to focus my energy to
get Momento to beta. I was going to use Linotype, actually, and
use Momento to store my blog.
(Linotype is a good application since I would want to backup
Momento after an update for the time being. That extra
step is acceptable for a blog, provided it is a single
user blog.)
Towards scalablity Momento supports concurrent reads, and, with
some educated decisions by the application developer, concurrent
updates. Momento works as a data store for a multi-threaded
server. It would work nicely as a servlet, or in a Cocoon pipeline.
For further scaleability, I plan on supporting multi-process
operation, and indicies. (That's right, no indicies yet. They
are not at the core of Momento.)
1) Momento has three concepts that rise to the surface when I
consider it.
* Zero, I always forget that I spent a month writing a
journaling file data structure. It just hums along quietly
in the background now. It splits a random access file into
pages. Reads those pages in and out of memory. It uses
weak references to implement a page cache. It's pretty
cool, but its pretty much done, so...
* First, Momento maintains a version axis.
Rather than updating a node, Momento links a new version
of that node to its version axis. An XSLT transform
navigates a Momento document with a version number in
hand. When you get the first child, say, you check to see
if there is a new version of that child, and iterate, but
never past the maximum version.
The older versions are kept around until any queries
referencing them terminate. At that point, the older
versions of the nodes can be collected.
Thus a newer version of the document can be assembled
while the existing version is queried. That newer version
can even be discarded and it will be ignored (iterate
beyond to the next good version, or stick with the last
good version). Volia: commit and rollback.
The version-axis allows for however many concurrent
queries, they will not have to wait for updates.
* Second, Momento organizes its nodes in clusters.
An application developer tunes their application for
performance by specifying which nodes will to be clustered
on the same pages. This ought to be a obvious decision for
the most part. Consdier a bug database.
<bug-document xmlns="http://engrm.com/bugs">
<project name="Momento">
<issue name="Won't do this">
<comment />
<comment />
.
.
</issue>
<issue name="Doesn't do that">
<comment />
<comment />
.
.
</issue>
.
.
.
.
</project>
</project>
In the above document, it is likely that most of the data
manipulation will occur within an issue. If the nodes are
clustered by issue, then Momento can place an read lock
at the issue node, allowing other issue nodes to be updated
concurrently.
There will be aggreagte queries, but most queries
(transforms) will manipulate an issue. It makes sense then
to cluster the nodes in an issue on the same pages.
It makes sense to to cluster the issue nodes themselves
together since the most likely axis of traversal is the the
next-sibling axis, used to find a specific issue by name.
Clusters are akin to files on the file system, really.
* Third, Momento maintains performance through node proximity
by reorganizing clusters, and performance of updates by
roorganinzing tclusters as a spearate step apart from
updating them.
This is tricky to explain, not that its a complicated
concept, I just haven't tried to document it yet, bare
with..
In its organized state, a cluster contains it's nodes in
document order. This puts first children on the same page
a parent and a next sibling on the same page a previous
sibling, that is, not too far away.
When a cluster mutates, new nodes are allocated from a
scrap page of nodes and linked to the version axis. Now the
document order is wonky. The newer version will cause
queries to iterate into the scrap pages, proximity starts
to suffer.
Therefore, as a second step to updating the document,
Momento must organize itself by copying the last commited
version of a cluster to a new set of pages, retaining only
the latest version of each node.
This can occur in a separate thread, or more likely at the
prompting of the application developer. If a user is
pecking away at an interactive form, it may make sense for
the user to press Okay before going to the trouble to
organize the cluster.
In most XML applications, there are going to be natural
candidates for a Momento cluster. Often this will map
directly to a file on the file system, in an existing
application.
My communication skills are getting streched by all the
announcing. Please let me know if this is a good explaintion.
I can use it to create a better overview document.
I'll try not to be so prolix in the follow up. They were broad
questions. With these concepts, everything else in Momento is
obvious. It is pretty simple.
I was planning on announcing all week, but I'm traveling instead. I
do hope to foster discussion of this project. I'll check in at
the airport. More questions, please.
There is a mailing list too. Please join to participate or observe.
[EMAIL PROTECTED]
Thanks again for asking.
--
Alan / [EMAIL PROTECTED] / http://engrm.com/
aim/yim: alanengrm - icq: 228631855 - msn: [EMAIL PROTECTED]