Re: [Neo4j] Trying to understanding query speed

Michael Hunger Mon, 10 Feb 2014 08:37:04 -0800

Am 10.02.2014 um 16:46 schrieb V <[email protected]>:

> Thank you for the response Michael, very helpful.
> 
> I thought it was me doing something wrong :)
> 
> I've tried Option 1, using the Deffered Constraint. Query times in Cypher and 
> the Java API have improved greatly.
> 
> Results from Option 1:
> Cypher:
>     MATCH (c:Customer)
>     WHERE c.customerId = 279781
>     RETURN c;
>     // Quite variable ~100ms-150ms
> 
>     MATCH (c:Customer)-[]->(p:Product)
>     WHERE c.customerId = 7593729
>     RETURN c,p;
>     // Quite variable ~150ms-200ms
> 
> 
> Java API:
> // Get customer node
> Total time: 00h 00m 00s 118ms
> Total time: 00h 00m 00s 02ms
> Total time: 00h 00m 00s 01ms
> Total time: 00h 00m 00s 01ms
> 
> // Get customer's purchased product nodes
> Total time: 00h 00m 00s 191ms
> Total time: 00h 00m 00s 09ms
> Total time: 00h 00m 00s 06ms
> Total time: 00h 00m 00s 08ms
> 
> Just a couple of questions from this;
> 
> 1) I can't seem to find anything in either the documentation or the Java API 
> javadoc about adding indexes to the relationships in this way. All I've found 
> are notes about making indexes on Node Labels. Is this something that isn't 
> currently available? Or have I overlooked something again?


Not available and not planned, what is your use-case for those?
> 
> 2) I wanted to check if the indexes were indeed created, since I couldn't 
> find a Cypher query to list the indexes I tried the neo4j-sh command index 
> --indexes but there were no listed node indexes. Is this because the indexes 
> were created differently and are managed differently now? If so, is the Java 
> API the only way currently to check the indexes?

"index --indexes" is for the legacy indexes.
The command is "schema" in the shell, ":schema" in the browser and 
db.schema().... for embedded.


Btw. the first query is slower as the data has to be loaded from disk first.

Cheers

Michael

> 
> 
> Many thanks,
> V
> 
> 
> 
> On Monday, February 10, 2014 9:13:50 AM UTC, Michael Hunger wrote:
> I think you ran into some misunderstanding of Neo4j indexes. Sorry for the 
> confusion.
> 
> What you created were effectively legacy indexes that were how things were 
> done in 1.9 and before.
> 
> With Neo4j 2.0 we have label based indexes that work are used differently.
> 
> So what you can do (using 2.0.1):
> 
> #1 rebuild your db without the legacy indexing and instead create unique
> 
> But use this instead:
> batchInserter.createDeferredSchemaIndex(label).on(property).create();
> or
> batchInserter.createDeferredConstraint(label).assertPropertyIsUnique(property).create();
> 
> 
> #2 keep your db but delete everything under graph.db/index
> 
> and either create just an index like this (adapt your label and 
> property-name) in cypher:
> 
> create index on :Customer(id)
> 
> or even a unique constraint (for unique identifiers)
> 
> create constraint on (c:Customer) assert c.id is unique
> 
> the transactional Java API is: 
> 
>                 
> db.schema().indexFor(DynamicLabel.label(label)).on(property).create();
> or
>               
> db.schema().constraintFor(DynamicLabel.label(label)).assertPropertyIsUnique(property).create();
> 
> Am 09.02.2014 um 22:57 schrieb V <[email protected]>:
> 
>> Hi,
>> 
>> I've spent a few hours today looking at the Neo4J docs and playing around. I 
>> started to do something serious for evaluation and I'm a bit frustrated with 
>> myself.
>> 
>> Using the BatchInserterIndex I have created a graph with:
>> 
>> 1,097,874 million nodes
>> 1,097,874 million properties
>> 8,104,479 million relationships
>> 
>> The database size is 829 MB on disk.
>> The indexes directory size is 515 MB. (du -ch data/graph.db/index | grep 
>> total)
>> 
>> The graph has two node types Customers and Products, the only property on 
>> these nodes is an ID used to identify the entity in another datastore, and a 
>> single relationship type of Purchased.
>> 
>> I have created indexes using the BatchInserterIndexProvider class. If 
>> required I can post my full source code but essentially this is the importer 
>> code:
>> 
>> // Create the db and indexes
>> BatchInserter inserter = BatchInserters.inserter("target/graph.db");
>> BatchInserterIndexProvider indexProvider = new 
>> LuceneBatchInserterIndexProvider(inserter);
>> BatchInserterIndex customersIndex = indexProvider.nodeIndex("customersIdx", 
>> MapUtil.stringMap("type", "exact"));
>> customersIndex.setCacheCapacity("customerId", 100000);
>> // Indexes for Product nodes and Purchased Relationship created in the same 
>> way
>> 
>> // Create and add node to index
>> long cId = inserter.createNode(customerProperties, customerLabel);
>> customersIndex.add(nodeId, customerProperties);
>> 
>> long pId = inserter.createNode(productProperties, productLabel);
>> productsIndex.add(nodeId, productProperties);
>> 
>> long purchRelId = inserter.createRelationship(cId, pId, PURCHASED, null);
>> purchasesIndex.add(purchRelId, EMPTY_MAP);
>> 
>> // Flush indexes and shutdown batch inserter
>> customersIndex.flush();
>> productsIndex.flush();
>> purchasesIndex.flush();
>> indexProvider.shutdown();
>> inserter.shutdown();
>> 
>> 
>> Once the batch indexer completes I copy the files to the real location of 
>> the database and start the Neo4J server.
>> 
>> 
>> Attempt 1 with Cypher
>> 
>> When I run a cypher query such as:
>> 
>>     MATCH (c:Customer)
>>     WHERE c.customerId = 7593729
>>     RETURN c;
>> 
>> 
>> The response returns in around 8 seconds the first time, and then around 900 
>> ms the following times.
>> 
>> So, I thought perhaps it was just Cyhper, since I read that the Cypher 
>> queries could be slow I tried with the Java API.
>> 
>> 
>> 
>> 
>> Attempt 2 with JAVA API
>> 
>> This is how I did the same query via the Java API:
>> 
>>     DateTime startTime = new DateTime();
>> 
>>     Transaction tx = graphDb.beginTx();
>>     ResourceIterator<Node> nodes = 
>> graphDb.findNodesByLabelAndProperty(customerLabel, "customerId", 
>> 7593729).iterator();
>> 
>>     DateTime finishTime = new DateTime();
>> 
>>     while(nodes.hasNext()) {
>>         Node node = nodes.next();
>>         System.out.println(node.getProperty("customerId"));
>>     }
>> 
>>     Period period = new Period(startTime, finishTime);
>>     System.out.println("Total time: " + HHMMSSFormater.print(period));
>> 
>> 
>> The query was executed 4 times in a row and this is the result:
>> 
>> Total time: 00h 00m 00s 355
>> Total time: 00h 00m 00s 55
>> Total time: 00h 00m 00s 04
>> Total time: 00h 00m 00s 04
>> 
>> Awesome! BUT...
>> 
>> If I change the code slightly, and put the finish time after the while loop 
>> and run the same test the result is:
>> 
>> Total time: 00h 00m 06s 494
>> Total time: 00h 00m 00s 416
>> Total time: 00h 00m 00s 294
>> Total time: 00h 00m 00s 302
>> 
>> 
>> So it looks like iterating over the nodes took 6 seconds the first time, 
>> this seems like a long time given that there's only a single Node in the 
>> query result.
>> 
>> 
>> Questions
>> 
>> 1. Why are my Cypher and Java queries slow?
>> 2. Have I messed up and not understood how indexing works or is this normal 
>> and expected?
>> 3. How can I make the queries/result reading faster?
>> 
>> 
>> Many thanks for any replies.
>> 
>> 
>> 
>> 
>> 
>> 
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "Neo4j" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected].
>> For more options, visit https://groups.google.com/groups/opt_out.
> 
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Neo4j" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected].
> For more options, visit https://groups.google.com/groups/opt_out.

-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.

Re: [Neo4j] Trying to understanding query speed

Reply via email to