Hi,
I've spent a few hours today looking at the Neo4J docs and playing around.
I started to do something serious for evaluation and I'm a bit frustrated
with myself.
Using the BatchInserterIndex I have created a graph with:
1,097,874 million nodes
1,097,874 million properties
8,104,479 million relationships
The database size is 829 MB on disk.
The indexes directory size is 515 MB. (du -ch data/graph.db/index | grep
total)
The graph has two node types Customers and Products, the only property on
these nodes is an ID used to identify the entity in another datastore, and
a single relationship type of Purchased.
I have created indexes using the BatchInserterIndexProvider class. If
required I can post my full source code but essentially this is the
importer code:
// Create the db and indexes
BatchInserter inserter = BatchInserters.inserter("target/graph.db");
BatchInserterIndexProvider indexProvider = new
LuceneBatchInserterIndexProvider(inserter);
BatchInserterIndex customersIndex = indexProvider.nodeIndex("customersIdx",
MapUtil.stringMap("type", "exact"));
customersIndex.setCacheCapacity("customerId", 100000);
// Indexes for Product nodes and Purchased Relationship created in the same
way
// Create and add node to index
long cId = inserter.createNode(customerProperties, customerLabel);
customersIndex.add(nodeId, customerProperties);
long pId = inserter.createNode(productProperties, productLabel);
productsIndex.add(nodeId, productProperties);
long purchRelId = inserter.createRelationship(cId, pId, PURCHASED, null);
purchasesIndex.add(purchRelId, EMPTY_MAP);
// Flush indexes and shutdown batch inserter
customersIndex.flush();
productsIndex.flush();
purchasesIndex.flush();
indexProvider.shutdown();
inserter.shutdown();
Once the batch indexer completes I copy the files to the real location of
the database and start the Neo4J server.
*Attempt 1 with Cypher*
When I run a cypher query such as:
MATCH (c:Customer)
WHERE c.customerId = 7593729
RETURN c;
The response returns in around 8 seconds the first time, and then around
900 ms the following times.
So, I thought perhaps it was just Cyhper, since I read that the Cypher
queries could be slow I tried with the Java API.
*Attempt 2 with JAVA API*
This is how I did the same query via the Java API:
DateTime startTime = new DateTime();
Transaction tx = graphDb.beginTx();
ResourceIterator<Node> nodes =
graphDb.findNodesByLabelAndProperty(customerLabel, "customerId", 7593729
).iterator();
DateTime finishTime = new DateTime();
while(nodes.hasNext()) {
Node node = nodes.next();
System.out.println(node.getProperty("customerId"));
}
Period period = new Period(startTime, finishTime);
System.out.println("Total time: " + HHMMSSFormater.print(period));
The query was executed 4 times in a row and this is the result:
Total time: 00h 00m 00s 355
Total time: 00h 00m 00s 55
Total time: 00h 00m 00s 04
Total time: 00h 00m 00s 04
Awesome! BUT...
If I change the code slightly, and put the finish time after the while loop
and run the same test the result is:
Total time: 00h 00m 0*6s* 494
Total time: 00h 00m 00s 416
Total time: 00h 00m 00s 294
Total time: 00h 00m 00s 302
So it looks like iterating over the nodes took 6 seconds the first time,
this seems like a long time given that there's only a single Node in the
query result.
*Questions*
1. Why are my Cypher and Java queries slow?
2. Have I messed up and not understood how indexing works or is this normal
and expected?
3. How can I make the queries/result reading faster?
Many thanks for any replies.
--
You received this message because you are subscribed to the Google Groups
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.