Re: [Neo4j] Trying to understanding query speed

Michael Hunger Mon, 10 Feb 2014 01:14:27 -0800

I think you ran into some misunderstanding of Neo4j indexes. Sorry for the 
confusion.


What you created were effectively legacy indexes that were how things were done 
in 1.9 and before.

With Neo4j 2.0 we have label based indexes that work are used differently.

So what you can do (using 2.0.1):

#1 rebuild your db without the legacy indexing and instead create unique

But use this instead:
batchInserter.createDeferredSchemaIndex(label).on(property).create();
or
batchInserter.createDeferredConstraint(label).assertPropertyIsUnique(property).create();


#2 keep your db but delete everything under graph.db/index

and either create just an index like this (adapt your label and property-name) 
in cypher:

create index on :Customer(id)

or even a unique constraint (for unique identifiers)

create constraint on (c:Customer) assert c.id is unique

the transactional Java API is: 

                
db.schema().indexFor(DynamicLabel.label(label)).on(property).create();
or
                
db.schema().constraintFor(DynamicLabel.label(label)).assertPropertyIsUnique(property).create();

Am 09.02.2014 um 22:57 schrieb V <[email protected]>:

> Hi,
> 
> I've spent a few hours today looking at the Neo4J docs and playing around. I 
> started to do something serious for evaluation and I'm a bit frustrated with 
> myself.
> 
> Using the BatchInserterIndex I have created a graph with:
> 
> 1,097,874 million nodes
> 1,097,874 million properties
> 8,104,479 million relationships
> 
> The database size is 829 MB on disk.
> The indexes directory size is 515 MB. (du -ch data/graph.db/index | grep 
> total)
> 
> The graph has two node types Customers and Products, the only property on 
> these nodes is an ID used to identify the entity in another datastore, and a 
> single relationship type of Purchased.
> 
> I have created indexes using the BatchInserterIndexProvider class. If 
> required I can post my full source code but essentially this is the importer 
> code:
> 
> // Create the db and indexes
> BatchInserter inserter = BatchInserters.inserter("target/graph.db");
> BatchInserterIndexProvider indexProvider = new 
> LuceneBatchInserterIndexProvider(inserter);
> BatchInserterIndex customersIndex = indexProvider.nodeIndex("customersIdx", 
> MapUtil.stringMap("type", "exact"));
> customersIndex.setCacheCapacity("customerId", 100000);
> // Indexes for Product nodes and Purchased Relationship created in the same 
> way
> 
> // Create and add node to index
> long cId = inserter.createNode(customerProperties, customerLabel);
> customersIndex.add(nodeId, customerProperties);
> 
> long pId = inserter.createNode(productProperties, productLabel);
> productsIndex.add(nodeId, productProperties);
> 
> long purchRelId = inserter.createRelationship(cId, pId, PURCHASED, null);
> purchasesIndex.add(purchRelId, EMPTY_MAP);
> 
> // Flush indexes and shutdown batch inserter
> customersIndex.flush();
> productsIndex.flush();
> purchasesIndex.flush();
> indexProvider.shutdown();
> inserter.shutdown();
> 
> 
> Once the batch indexer completes I copy the files to the real location of the 
> database and start the Neo4J server.
> 
> 
> Attempt 1 with Cypher
> 
> When I run a cypher query such as:
> 
>     MATCH (c:Customer)
>     WHERE c.customerId = 7593729
>     RETURN c;
> 
> 
> The response returns in around 8 seconds the first time, and then around 900 
> ms the following times.
> 
> So, I thought perhaps it was just Cyhper, since I read that the Cypher 
> queries could be slow I tried with the Java API.
> 
> 
> 
> 
> Attempt 2 with JAVA API
> 
> This is how I did the same query via the Java API:
> 
>     DateTime startTime = new DateTime();
> 
>     Transaction tx = graphDb.beginTx();
>     ResourceIterator<Node> nodes = 
> graphDb.findNodesByLabelAndProperty(customerLabel, "customerId", 
> 7593729).iterator();
> 
>     DateTime finishTime = new DateTime();
> 
>     while(nodes.hasNext()) {
>         Node node = nodes.next();
>         System.out.println(node.getProperty("customerId"));
>     }
> 
>     Period period = new Period(startTime, finishTime);
>     System.out.println("Total time: " + HHMMSSFormater.print(period));
> 
> 
> The query was executed 4 times in a row and this is the result:
> 
> Total time: 00h 00m 00s 355
> Total time: 00h 00m 00s 55
> Total time: 00h 00m 00s 04
> Total time: 00h 00m 00s 04
> 
> Awesome! BUT...
> 
> If I change the code slightly, and put the finish time after the while loop 
> and run the same test the result is:
> 
> Total time: 00h 00m 06s 494
> Total time: 00h 00m 00s 416
> Total time: 00h 00m 00s 294
> Total time: 00h 00m 00s 302
> 
> 
> So it looks like iterating over the nodes took 6 seconds the first time, this 
> seems like a long time given that there's only a single Node in the query 
> result.
> 
> 
> Questions
> 
> 1. Why are my Cypher and Java queries slow?
> 2. Have I messed up and not understood how indexing works or is this normal 
> and expected?
> 3. How can I make the queries/result reading faster?
> 
> 
> Many thanks for any replies.
> 
> 
> 
> 
> 
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Neo4j" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected].
> For more options, visit https://groups.google.com/groups/opt_out.

-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.

Re: [Neo4j] Trying to understanding query speed

Reply via email to