[Neo4j] Trying to understanding query speed

V Mon, 10 Feb 2014 00:14:25 -0800

Hi,

I've spent a few hours today looking at the Neo4J docs and playing around. 
I started to do something serious for evaluation and I'm a bit frustrated 
with myself.


Using the BatchInserterIndex I have created a graph with:

1,097,874 million nodes
1,097,874 million properties
8,104,479 million relationships

The database size is 829 MB on disk.
The indexes directory size is 515 MB. (du -ch data/graph.db/index | grep 
total)

The graph has two node types Customers and Products, the only property on 
these nodes is an ID used to identify the entity in another datastore, and 
a single relationship type of Purchased.

I have created indexes using the BatchInserterIndexProvider class. If 
required I can post my full source code but essentially this is the 
importer code:

// Create the db and indexes
BatchInserter inserter = BatchInserters.inserter("target/graph.db");
BatchInserterIndexProvider indexProvider = new 
LuceneBatchInserterIndexProvider(inserter);
BatchInserterIndex customersIndex = indexProvider.nodeIndex("customersIdx", 
MapUtil.stringMap("type", "exact"));
customersIndex.setCacheCapacity("customerId", 100000);
// Indexes for Product nodes and Purchased Relationship created in the same 
way

// Create and add node to index
long cId = inserter.createNode(customerProperties, customerLabel);
customersIndex.add(nodeId, customerProperties);

long pId = inserter.createNode(productProperties, productLabel);
productsIndex.add(nodeId, productProperties);

long purchRelId = inserter.createRelationship(cId, pId, PURCHASED, null);
purchasesIndex.add(purchRelId, EMPTY_MAP);

// Flush indexes and shutdown batch inserter
customersIndex.flush();
productsIndex.flush();
purchasesIndex.flush();
indexProvider.shutdown();
inserter.shutdown();


Once the batch indexer completes I copy the files to the real location of 
the database and start the Neo4J server.


*Attempt 1 with Cypher*

When I run a cypher query such as:

    MATCH (c:Customer)
    WHERE c.customerId = 7593729
    RETURN c;


The response returns in around 8 seconds the first time, and then around 
900 ms the following times.

So, I thought perhaps it was just Cyhper, since I read that the Cypher 
queries could be slow I tried with the Java API.




*Attempt 2 with JAVA API*

This is how I did the same query via the Java API:

    DateTime startTime = new DateTime();

    Transaction tx = graphDb.beginTx();
    ResourceIterator<Node> nodes = 
graphDb.findNodesByLabelAndProperty(customerLabel, "customerId", 7593729
).iterator();

    DateTime finishTime = new DateTime();

    while(nodes.hasNext()) {
        Node node = nodes.next();
        System.out.println(node.getProperty("customerId"));
    }

    Period period = new Period(startTime, finishTime);
    System.out.println("Total time: " + HHMMSSFormater.print(period));


The query was executed 4 times in a row and this is the result:

Total time: 00h 00m 00s 355
Total time: 00h 00m 00s 55
Total time: 00h 00m 00s 04
Total time: 00h 00m 00s 04

Awesome! BUT...

If I change the code slightly, and put the finish time after the while loop 
and run the same test the result is:

Total time: 00h 00m 0*6s* 494
Total time: 00h 00m 00s 416
Total time: 00h 00m 00s 294
Total time: 00h 00m 00s 302


So it looks like iterating over the nodes took 6 seconds the first time, 
this seems like a long time given that there's only a single Node in the 
query result.


*Questions*

1. Why are my Cypher and Java queries slow?
2. Have I messed up and not understood how indexing works or is this normal 
and expected?
3. How can I make the queries/result reading faster?


Many thanks for any replies.





-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.

[Neo4j] Trying to understanding query speed

Reply via email to