Re: [Neo4j] Trying to understanding query speed

Michael Hunger Mon, 03 Mar 2014 06:20:16 -0800

Unfortunately we don't support indexes on relationships for these kinds of 
queries yet.


Usually the "Order" is such an important domain concept (with the time 
information) that you would index that and you could also create addtional 
in-graph structures to handle time-information on orders in even more efficient 
ways.

something like this: 
http://docs.neo4j.org/chunked/milestone/cypher-cookbook-path-tree.html

HTH

Michael

Am 03.03.2014 um 12:03 schrieb V <[email protected]>:

> Hi Michael,
> 
> Apologies for the very slow reply.
> 
> The use case for having indexes on a relationship property would be to be 
> able to find purchases in a given time period, say 'last 7 days'.
> 
> Given a graph such as:
> 
> (Customer)-[:Purchased {date:<datetime as millliseconds>}]->(Product)
> 
> I have however changed the structure to be like this: 
> 
> (Customer)-[Order {date}]->(Product)
> 
> I think this is a nicer model, so not having indexes on a Relationship seems 
> to be fine. I've not come across another case where I would need it.
> 
> Thank you,
> V
> 
> 
> On Monday, February 10, 2014 4:35:52 PM UTC, Michael Hunger wrote:
> 
> Am 10.02.2014 um 16:46 schrieb V <[email protected]>:
> 
>> Thank you for the response Michael, very helpful.
>> 
>> I thought it was me doing something wrong :)
>> 
>> I've tried Option 1, using the Deffered Constraint. Query times in Cypher 
>> and the Java API have improved greatly.
>> 
>> Results from Option 1:
>> Cypher:
>>     MATCH (c:Customer)
>>     WHERE c.customerId = 279781
>>     RETURN c;
>>     // Quite variable ~100ms-150ms
>> 
>>     MATCH (c:Customer)-[]->(p:Product)
>>     WHERE c.customerId = 7593729
>>     RETURN c,p;
>>     // Quite variable ~150ms-200ms
>> 
>> 
>> Java API:
>> // Get customer node
>> Total time: 00h 00m 00s 118ms
>> Total time: 00h 00m 00s 02ms
>> Total time: 00h 00m 00s 01ms
>> Total time: 00h 00m 00s 01ms
>> 
>> // Get customer's purchased product nodes
>> Total time: 00h 00m 00s 191ms
>> Total time: 00h 00m 00s 09ms
>> Total time: 00h 00m 00s 06ms
>> Total time: 00h 00m 00s 08ms
>> 
>> Just a couple of questions from this;
>> 
>> 1) I can't seem to find anything in either the documentation or the Java API 
>> javadoc about adding indexes to the relationships in this way. All I've 
>> found are notes about making indexes on Node Labels. Is this something that 
>> isn't currently available? Or have I overlooked something again?
> 
> Not available and not planned, what is your use-case for those?
>> 
>> 2) I wanted to check if the indexes were indeed created, since I couldn't 
>> find a Cypher query to list the indexes I tried the neo4j-sh command index 
>> --indexes but there were no listed node indexes. Is this because the indexes 
>> were created differently and are managed differently now? If so, is the Java 
>> API the only way currently to check the indexes?
> 
> "index --indexes" is for the legacy indexes.
> The command is "schema" in the shell, ":schema" in the browser and 
> db.schema().... for embedded.
> 
> 
> Btw. the first query is slower as the data has to be loaded from disk first.
> 
> Cheers
> 
> Michael
> 
>> 
>> 
>> Many thanks,
>> V
>> 
>> 
>> 
>> On Monday, February 10, 2014 9:13:50 AM UTC, Michael Hunger wrote:
>> I think you ran into some misunderstanding of Neo4j indexes. Sorry for the 
>> confusion.
>> 
>> What you created were effectively legacy indexes that were how things were 
>> done in 1.9 and before.
>> 
>> With Neo4j 2.0 we have label based indexes that work are used differently.
>> 
>> So what you can do (using 2.0.1):
>> 
>> #1 rebuild your db without the legacy indexing and instead create unique
>> 
>> But use this instead:
>> batchInserter.createDeferredSchemaIndex(label).on(property).create();
>> or
>> batchInserter.createDeferredConstraint(label).assertPropertyIsUnique(property).create();
>> 
>> 
>> #2 keep your db but delete everything under graph.db/index
>> 
>> and either create just an index like this (adapt your label and 
>> property-name) in cypher:
>> 
>> create index on :Customer(id)
>> 
>> or even a unique constraint (for unique identifiers)
>> 
>> create constraint on (c:Customer) assert c.id is unique
>> 
>> the transactional Java API is: 
>> 
>>                 
>> db.schema().indexFor(DynamicLabel.label(label)).on(property).create();
>> or
>>              
>> db.schema().constraintFor(DynamicLabel.label(label)).assertPropertyIsUnique(property).create();
>> 
>> Am 09.02.2014 um 22:57 schrieb V <[email protected]>:
>> 
>>> Hi,
>>> 
>>> I've spent a few hours today looking at the Neo4J docs and playing around. 
>>> I started to do something serious for evaluation and I'm a bit frustrated 
>>> with myself.
>>> 
>>> Using the BatchInserterIndex I have created a graph with:
>>> 
>>> 1,097,874 million nodes
>>> 1,097,874 million properties
>>> 8,104,479 million relationships
>>> 
>>> The database size is 829 MB on disk.
>>> The indexes directory size is 515 MB. (du -ch data/graph.db/index | grep 
>>> total)
>>> 
>>> The graph has two node types Customers and Products, the only property on 
>>> these nodes is an ID used to identify the entity in another datastore, and 
>>> a single relationship type of Purchased.
>>> 
>>> I have created indexes using the BatchInserterIndexProvider class. If 
>>> required I can post my full source code but essentially this is the 
>>> importer code:
>>> 
>>> // Create the db and indexes
>>> BatchInserter inserter = BatchInserters.inserter("target/graph.db");
>>> BatchInserterIndexProvider indexProvider = new 
>>> LuceneBatchInserterIndexProvider(inserter);
>>> BatchInserterIndex customersIndex = indexProvider.nodeIndex("customersIdx", 
>>> MapUtil.stringMap("type", "exact"));
>>> customersIndex.setCacheCapacity("customerId", 100000);
>>> // Indexes for Product nodes and Purchased Relationship created in the same 
>>> way
>>> 
>>> // Create and add node to index
>>> long cId = inserter.createNode(customerProperties, customerLabel);
>>> customersIndex.add(nodeId, customerProperties);
>>> 
>>> long pId = inserter.createNode(productProperties, productLabel);
>>> productsIndex.add(nodeId, productProperties);
>>> 
>>> long purchRelId = inserter.createRelationship(cId, pId, PURCHASED, null);
>>> purchasesIndex.add(purchRelId, EMPTY_MAP);
>>> 
>>> // Flush indexes and shutdown batch inserter
>>> customersIndex.flush();
>>> productsIndex.flush();
>>> purchasesIndex.flush();
>>> indexProvider.shutdown();
>>> inserter.shutdown();
>>> 
>>> 
>>> Once the batch indexer completes I copy the files to the real location of 
>>> the database and start the Neo4J server.
>>> 
>>> 
>>> Attempt 1 with Cypher
>>> 
>>> When I run a cypher query such as:
>>> 
>>>     MATCH (c:Customer)
>>>     WHERE c.customerId = 7593729
>>>     RETURN c;
>>> 
>>> 
>>> The response returns in around 8 seconds the first time, and then around 
>>> 900 ms the following times.
>>> 
>>> So, I thought perhaps it was just Cyhper, since I read that the Cypher 
>>> queries could be slow I tried with the Java API.
>>> 
>>> 
>>> 
>>> 
>>> Attempt 2 with JAVA API
>>> 
>>> This is how I did the same query via the Java API:
>>> 
>>>     DateTime startTime = new DateTime();
>>> 
>>>     Transaction tx = graphDb.beginTx();
>>>     ResourceIterator<Node> nodes = 
>>> graphDb.findNodesByLabelAndProperty(customerLabel, "customerId", 
>>> 7593729).iterator();
>>> 
>>>     DateTime finishTime = new DateTime();
>>> 
>>>     while(nodes.hasNext()) {
>>>         Node node = nodes.next();
>>>         System.out.println(node.getProperty("customerId"));
>>>     }
>>> 
>>>     Period period = new Period(startTime, finishTime);
>>>     System.out.println("Total time: " + HHMMSSFormater.print(period));
>>> 
>>> 
>>> The query was executed 4 times in a row and this is the result:
>>> 
>>> Total time: 00h 00m 00s 355
>>> Total time: 00h 00m 00s 55
>>> Total time: 00h 00m 00s 04
>>> Total time: 00h 00m 00s 04
>>> 
>>> Awesome! BUT...
>>> 
>>> If I change the code slightly, and put the finish time after the while loop 
>>> and run the same test the result is:
>>> 
>>> Total time: 00h 00m 06s 494
>>> Total time: 00h 00m 00s 416
>>> Total time: 00h 00m 00s 294
>>> Total time: 00h 00m 00s 302
>>> 
>>> 
>>> So it looks like iterating over the nodes took 6 seconds the first time, 
>>> this seems like a long time given that there's only a single Node in the 
>>> query result.
>>> 
>>> 
>>> Questions
>>> 
>>> 1. Why are my Cypher and Java queries slow?
>>> 2. Have I messed up and not understood how indexing works or is this normal 
>>> and expected?
>>> 3. How can I make the queries/result reading faster?
>>> 
>>> 
>>> Many thanks for any replies.
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> -- 
>>> You received this message because you are subscribed to the Google Groups 
>>> "Neo4j" group.
>>> To unsubscribe from this group and stop receiving emails from it, send an 
>>> email to [email protected].
>>> For more options, visit https://groups.google.com/groups/opt_out.
>> 
>> 
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "Neo4j" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected].
>> For more options, visit https://groups.google.com/groups/opt_out.
> 
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Neo4j" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected].
> For more options, visit https://groups.google.com/groups/opt_out.

-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.

Re: [Neo4j] Trying to understanding query speed

Reply via email to