* Stefano Mazzocchi <[EMAIL PROTECTED]> [2004-02-17 04:22]:
Alan wrote:
Momento is a native XML persistence engine. Is supports XSLT 2.0 and XQuery 1.0 (via Saxon), it supports XUpdate.
It is transactional and ACID.
It is designed with Cocoon in mind.
I am considering a open source path for future development.
http://engrm.com/project/com.agtrz.momento/
Thoughts?
Very interesting. Can you tell us more on how it works? have any numbers on how fast/scalable it gets? what's the difference between Momento and a native xml database like XIndice?
Stefano
Thanks for asking.
3) I can't compare Momento to Xindice at the moment. Last I looked at Xindice was last November. I'm announcing to get such insight.
(I am willing to say that Momento isn't wedded to a specific
API, such as XML::DB. It works with Saxon to provide
XQuery and XSLT.
What is the interfacing API then? JAXP? or Saxon's?
I'm implementing XUpdate currently. A
read-only W3 DOM is a simple matter, if there is call
for it. A read-write W3 DOM is not so simple, or
desireable, but entirely feasable.
I could be way off base. Let me know.
I'm not sure I get what you mean here with read/write DOM being desirable.
)
2) No numbers. I've not designed any benchmarks. Momento is to the point where I need an example application to focus my energy to get Momento to beta. I was going to use Linotype, actually, and use Momento to store my blog.
Uh, awesome!
(Linotype is a good application since I would want to backup
Momento after an update for the time being. That extra
step is acceptable for a blog, provided it is a single
user blog.)
Towards scalablity Momento supports concurrent reads, and, with
some educated decisions by the application developer, concurrent
updates.
uh, this sentence is worrysome. Can you elaborate more?
Momento works as a data store for a multi-threaded server. It would work nicely as a servlet, or in a Cocoon pipeline.
For further scaleability, I plan on supporting multi-process operation, and indicies. (That's right, no indicies yet. They are not at the core of Momento.)
ok
1) Momento has three concepts that rise to the surface when I consider it.
* Zero, I always forget that I spent a month writing a journaling file data structure. It just hums along quietly in the background now. It splits a random access file into pages. Reads those pages in and out of memory. It uses weak references to implement a page cache. It's pretty cool, but its pretty much done, so...
uh, sounds useful. We are having problems with JISP. Care to tell us more about this?
* First, Momento maintains a version axis.
Rather than updating a node, Momento links a new version
of that node to its version axis. An XSLT transform
navigates a Momento document with a version number in
hand. When you get the first child, say, you check to see
if there is a new version of that child, and iterate, but
never past the maximum version.
The older versions are kept around until any queries referencing them terminate. At that point, the older versions of the nodes can be collected.
Thus a newer version of the document can be assembled while the existing version is queried. That newer version can even be discarded and it will be ignored (iterate beyond to the next good version, or stick with the last good version). Volia: commit and rollback.
The version-axis allows for however many concurrent queries, they will not have to wait for updates.
Interesting approach. I don't see how you can do rollback if you garbage collect a node, though.
* Second, Momento organizes its nodes in clusters.
An application developer tunes their application for
performance by specifying which nodes will to be clustered
on the same pages. This ought to be a obvious decision for
the most part. Consdier a bug database.
<bug-document xmlns="http://engrm.com/bugs">
<project name="Momento">
<issue name="Won't do this">
<comment />
<comment />
.
.
</issue>
<issue name="Doesn't do that">
<comment />
<comment />
.
.
</issue>
.
.
.
.
</project>
</project>
In the above document, it is likely that most of the data
manipulation will occur within an issue. If the nodes are
clustered by issue, then Momento can place an read lock
at the issue node, allowing other issue nodes to be updated
concurrently.
There will be aggreagte queries, but most queries
(transforms) will manipulate an issue. It makes sense then
to cluster the nodes in an issue on the same pages.
It makes sense to to cluster the issue nodes themselves together since the most likely axis of traversal is the the next-sibling axis, used to find a specific issue by name.
Clusters are akin to files on the file system, really.
What is the strategy you use to clusterize nodes?
* Third, Momento maintains performance through node proximity
by reorganizing clusters, and performance of updates by
roorganinzing tclusters as a spearate step apart from
updating them.
This is tricky to explain, not that its a complicated concept, I just haven't tried to document it yet, bare with..
In its organized state, a cluster contains it's nodes in document order. This puts first children on the same page a parent and a next sibling on the same page a previous sibling, that is, not too far away.
When a cluster mutates, new nodes are allocated from a scrap page of nodes and linked to the version axis. Now the document order is wonky. The newer version will cause queries to iterate into the scrap pages, proximity starts to suffer.
Therefore, as a second step to updating the document,
Momento must organize itself by copying the last commited
version of a cluster to a new set of pages, retaining only
the latest version of each node.
This can occur in a separate thread, or more likely at the
prompting of the application developer. If a user is
pecking away at an interactive form, it may make sense for
the user to press Okay before going to the trouble to
organize the cluster.
In most XML applications, there are going to be natural candidates for a Momento cluster. Often this will map directly to a file on the file system, in an existing application.
My communication skills are getting streched by all the
announcing. Please let me know if this is a good explaintion.
I can use it to create a better overview document.
What you outlines sounds like an interesting approach but it's kinda foggy. A better outline document would be very useful (to me, at least) to understand if your approach could be used as a persistent native XML database that could scale.
I'll try not to be so prolix in the follow up. They were broad questions. With these concepts, everything else in Momento is obvious. It is pretty simple.
I was planning on announcing all week, but I'm traveling instead.
Announcing all week? one announcement is enough :-) the rest can be a regular email exchange don't you think?
I
do hope to foster discussion of this project. I'll check in at
the airport. More questions, please.
There is a mailing list too. Please join to participate or observe.
[EMAIL PROTECTED]
I won't subscribe to a new mail list until I see there is a reason for it, and for sure not before there is any code to take a look at.
For the licensing issue, keep in mind that we wouldn't be able to distribute your software (due to ASF policies) if you choose a license of the GPL family.
Thanks again for asking.
You are welcome.
-- Stefano.
smime.p7s
Description: S/MIME Cryptographic Signature
