Re: [Neo4j] Trying to understanding query speed

V Mon, 10 Feb 2014 07:47:20 -0800

Thank you for the response Michael, very helpful.

I thought it was me doing something wrong :)


I've tried Option 1, using the Deffered Constraint. Query times in Cypher 
and the Java API have improved greatly.

*Results from Option 1:*
Cypher:
    MATCH (c:Customer)
    WHERE c.customerId = 279781
    RETURN c;
    // Quite variable ~100ms-150ms

    MATCH (c:Customer)-[]->(p:Product)
    WHERE c.customerId = 7593729
    RETURN c,p;
    // Quite variable ~150ms-200ms


Java API:
// Get customer node
Total time: 00h 00m 00s 118ms
Total time: 00h 00m 00s 02ms
Total time: 00h 00m 00s 01ms
Total time: 00h 00m 00s 01ms

// Get customer's purchased product nodes
Total time: 00h 00m 00s 191ms
Total time: 00h 00m 00s 09ms
Total time: 00h 00m 00s 06ms
Total time: 00h 00m 00s 08ms

Just a couple of questions from this;

1) I can't seem to find anything in either the documentation or the Java 
API javadoc about adding indexes to the relationships in this way. All I've 
found are notes about making indexes on *Node Labels*. Is this something 
that isn't currently available? Or have I overlooked something again?

2) I wanted to check if the indexes were indeed created, since I couldn't 
find a Cypher query to list the indexes I tried the neo4j-sh command index 
--indexes but there were no listed node indexes. Is this because the 
indexes were created differently and are managed differently now? If so, is 
the Java API the only way currently to check the indexes?


Many thanks,
V



On Monday, February 10, 2014 9:13:50 AM UTC, Michael Hunger wrote:
>
> I think you ran into some misunderstanding of Neo4j indexes. Sorry for the 
> confusion.
>
> What you created were effectively legacy indexes that were how things were 
> done in 1.9 and before.
>
> With Neo4j 2.0 we have label based indexes that work are used differently.
>
> So what you can do (using 2.0.1):
>
> #1 rebuild your db without the legacy indexing and instead create unique
>
> But use this instead:
> batchInserter.createDeferredSchemaIndex(label).on(property).create();
> or
>
> batchInserter.createDeferredConstraint(label).assertPropertyIsUnique(property).create();
>
>
> #2 keep your db but delete everything under graph.db/index
>
> and either create just an index like this (adapt your label and 
> property-name) in cypher:
>
> create index on :Customer(id)
>
> or even a unique constraint (for unique identifiers)
>
> create constraint on (c:Customer) assert c.id is unique
>
> the transactional Java API is: 
>
>                 
> db.schema().indexFor(DynamicLabel.label(label)).on(property).create();
> or
>
> db.schema().constraintFor(DynamicLabel.label(label)).assertPropertyIsUnique(property).create();
>
> Am 09.02.2014 um 22:57 schrieb V <[email protected] <javascript:>>:
>
> Hi,
>
> I've spent a few hours today looking at the Neo4J docs and playing around. 
> I started to do something serious for evaluation and I'm a bit frustrated 
> with myself.
>
> Using the BatchInserterIndex I have created a graph with:
>
> 1,097,874 million nodes
> 1,097,874 million properties
> 8,104,479 million relationships
>
> The database size is 829 MB on disk.
> The indexes directory size is 515 MB. (du -ch data/graph.db/index | grep 
> total)
>
> The graph has two node types Customers and Products, the only property on 
> these nodes is an ID used to identify the entity in another datastore, and 
> a single relationship type of Purchased.
>
> I have created indexes using the BatchInserterIndexProvider class. If 
> required I can post my full source code but essentially this is the 
> importer code:
>
> // Create the db and indexes
> BatchInserter inserter = BatchInserters.inserter("target/graph.db");
> BatchInserterIndexProvider indexProvider = new 
> LuceneBatchInserterIndexProvider(inserter);
> BatchInserterIndex customersIndex = 
> indexProvider.nodeIndex("customersIdx", MapUtil.stringMap("type", "exact"));
> customersIndex.setCacheCapacity("customerId", 100000);
> // Indexes for Product nodes and Purchased Relationship created in the 
> same way
>
> // Create and add node to index
> long cId = inserter.createNode(customerProperties, customerLabel);
> customersIndex.add(nodeId, customerProperties);
>
> long pId = inserter.createNode(productProperties, productLabel);
> productsIndex.add(nodeId, productProperties);
>
> long purchRelId = inserter.createRelationship(cId, pId, PURCHASED, null);
> purchasesIndex.add(purchRelId, EMPTY_MAP);
>
> // Flush indexes and shutdown batch inserter
> customersIndex.flush();
> productsIndex.flush();
> purchasesIndex.flush();
> indexProvider.shutdown();
> inserter.shutdown();
>
>
> Once the batch indexer completes I copy the files to the real location of 
> the database and start the Neo4J server.
>
>
> *Attempt 1 with Cypher*
>
> When I run a cypher query such as:
>
>     MATCH (c:Customer)
>     WHERE c.customerId = 7593729
>     RETURN c;
>
>
> The response returns in around 8 seconds the first time, and then around 
> 900 ms the following times.
>
> So, I thought perhaps it was just Cyhper, since I read that the Cypher 
> queries could be slow I tried with the Java API.
>
>
>
>
> *Attempt 2 with JAVA API*
>
> This is how I did the same query via the Java API:
>
>     DateTime startTime = new DateTime();
>
>     Transaction tx = graphDb.beginTx();
>     ResourceIterator<Node> nodes = 
> graphDb.findNodesByLabelAndProperty(customerLabel, "customerId", 7593729
> ).iterator();
>
>     DateTime finishTime = new DateTime();
>
>     while(nodes.hasNext()) {
>         Node node = nodes.next();
>         System.out.println(node.getProperty("customerId"));
>     }
>
>     Period period = new Period(startTime, finishTime);
>     System.out.println("Total time: " + HHMMSSFormater.print(period));
>
>
> The query was executed 4 times in a row and this is the result:
>
> Total time: 00h 00m 00s 355
> Total time: 00h 00m 00s 55
> Total time: 00h 00m 00s 04
> Total time: 00h 00m 00s 04
>
> Awesome! BUT...
>
> If I change the code slightly, and put the finish time after the while 
> loop and run the same test the result is:
>
> Total time: 00h 00m 0*6s* 494
> Total time: 00h 00m 00s 416
> Total time: 00h 00m 00s 294
> Total time: 00h 00m 00s 302
>
>
> So it looks like iterating over the nodes took 6 seconds the first time, 
> this seems like a long time given that there's only a single Node in the 
> query result.
>
>
> *Questions*
>
> 1. Why are my Cypher and Java queries slow?
> 2. Have I messed up and not understood how indexing works or is this 
> normal and expected?
> 3. How can I make the queries/result reading faster?
>
>
> Many thanks for any replies.
>
>
>
>
>
>
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Neo4j" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected] <javascript:>.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.

Re: [Neo4j] Trying to understanding query speed

Reply via email to