Hi Thomas,

I think these are interesting graphs and they show that we do not exploit locality sufficiently when persisting nodes. However, your results base on the underlying assumption that sequential node ids lead to better locality than random node ids do. In general this will depends on the access pattern and the underlying persistence store: While for the append only case sequential node ids might exhibit good locality, for large repositories with many random CRUD operations this approach might eventually degenerate to behavior similar to random node ids.

I suggest to abstract node ids from storage ids: the node id is an identifier for retrieving nodes from JCR. A storage id is an identifier which is used by the persistence store to retrieve a certain piece of data (which could contain a node, part of a node or multiple nodes). A mapping from nodes ids to storage ids can then be used to tune locality: i.e. sequential vs. random, optimal block size etc.

Michael

On 22.12.10 9:51, Thomas Mueller wrote:
Currently, Jackrabbit uses random node ids (randomly generated UUIDs).
For Jackrabbit 3, and possibly Jackrabbit 2.x, I propose to use
sequential node ids.

The main advantages for random node ids (the default) are: it's easy to
develop, and merging data from multiple repositories is simple.

The disadvantages for random node ids are performance and space. With
performance, I don't mean that randomly generated UUIDs are slower to
generate (they are). I mean performance to read and write data to disk.
The main performance problem is reading and writing (indexing) random
keys, which is slow because adjacent (nearby) nodes data aren't adjacent
in the index. This slows down index access (read and write) a lot,
specially if the data doesn't fit in the buffer cache (file system cache).

I wrote a simple test case for Jackrabbit trunk. I also patched
Jackrabbit to allow sequential UUIDs. Creating 1 million nodes is about
80% faster when using sequential UUIDs instead of random UUIDs. I don't
have the data yet, but I'm sure the difference is much, much bigger when
reading from larger repositories once the data doesn't fit in RAM (10
million nodes or so). I will expand the test case to support reading.

The following diagram is the operations per second (bigger is better)
after inserting x nodes (0 to 1 million). With sequential UUIDs,
performance gets better and then stays the same (which is expected). For
random UUIDs which are now the default, performance first goes up and
then down. The resulting database (I used H2 for performance) was about
145 MB for sequential UUIDs and 169 MB for random UUIDs (the reason for
this difference is that the b-tree fill factor is better for sequential
UUIDs). All the data still fits in the buffer cache of the file system
without problem.



Something else I noticed when running the test case was that the
Jackrabbit internals are relatively slow. When profiling the test case,
I saw that database access is not an important factor. Creating the
nodes with the JCR API took 285 / 526 seconds, and re-creating the
database from a SQL script took 55 seconds (this includes reading the
200 MB SQL script file and parsing the SQL statements, one insert
statement per row; this is for the random UUIDs). I would expect
Jackrabbit is faster than 55 seconds, because it doesn't have to read
the SQL script from disk and parse statements.

The test case creates 1 million nodes, saves after each 10'000 nodes,
and uses a fan-out of 20 child nodes per node. I disabled the search
index in the repository configuration. Source code:


*import* java.io.File;

*import* javax.jcr.Node;

*import* javax.jcr.Session;

*import* javax.jcr.SimpleCredentials;

*import* org.apache.commons.io.FileUtils;

*import* org.apache.jackrabbit.core.TransientRepository;

*import* org.apache.jackrabbit.core.id.NodeId;


*public* *class* TestLoop {


*long* start;

Profiler prof;


*public* *static* *void* main(String... args) *throws* Exception {

*new* TestLoop().run();

}


*void* startProf() {

// if(prof != null) {

// prof.stopCollecting();

// System.out.println(prof.getTop(3));

// }

// prof = new Profiler();

// prof.interval = 1;

// prof.depth = 32;

// prof.startCollecting();

}


*void* run() *throws* Exception {

System.setProperty("jackrabbit.sequentialUUIDs", "false");

startProf();

FileUtils.deleteDirectory(*new* File("repository"));

FileUtils.copyFile(*new* File("repository copy.xml"),

*new* File("repository.xml"));

TransientRepository rep = *new* TransientRepository();

Session s = rep.login(

*new* SimpleCredentials("admin", "admin".toCharArray()));

*int* nodes = 1000000;

*int* nodesPerFolder = 20;

*int* saveEach = 10000;

System.out.println(*new* NodeId());

*int* levels = (*int*) Math.ceil(Math.log(nodes)

/ Math.log(nodesPerFolder));

start = System.currentTimeMillis();

startProf();

createNodes(s.getRootNode(), 0, nodes,

saveEach, levels, nodesPerFolder);

System.out.println("took: " + (System.currentTimeMillis() - start));

s.logout();

}


*private* *int* createNodes(Node parent, *int* x, *int* nodes,

*int* saveEach, *int* depth, *int* nodesPerFolder)

*throws* Exception {

*for* (*int* i = 0; x < nodes && i < nodesPerFolder; i++) {

*if* (depth > 1) {

Node n = parent.addNode("f" + x, "nt:unstructured");

x = createNodes(n, x, nodes,

saveEach, depth - 1, nodesPerFolder);

} *else* {

parent.addNode("n" + i, "nt:unstructured");

x++;

*if* ((x % saveEach) == 0) {

parent.getSession().save();

*long* t = System.currentTimeMillis() - start;

startProf();

System.out.println((x * 1000L / t) + " op/s at " + x);

}

}

}

*return* x;

}

}






Reply via email to