Re: [Neo4j] Trying to understanding query speed

V Mon, 03 Mar 2014 03:04:22 -0800

Hi Michael,

Apologies for the very slow reply.


The use case for having indexes on a relationship property would be to be 
able to find purchases in a given time period, say 'last 7 days'.

Given a graph such as:

(Customer)-[:Purchased {date:<datetime as millliseconds>}]->(Product)

I have however changed the structure to be like this: 

(Customer)-[Order {date}]->(Product)

I think this is a nicer model, so not having indexes on a Relationship 
seems to be fine. I've not come across another case where I would need it.

Thank you,
V


On Monday, February 10, 2014 4:35:52 PM UTC, Michael Hunger wrote:
>
>
> Am 10.02.2014 um 16:46 schrieb V <[email protected] <javascript:>>:
>
> Thank you for the response Michael, very helpful.
>
> I thought it was me doing something wrong :)
>
> I've tried Option 1, using the Deffered Constraint. Query times in Cypher 
> and the Java API have improved greatly.
>
> *Results from Option 1:*
> Cypher:
>     MATCH (c:Customer)
>     WHERE c.customerId = 279781
>     RETURN c;
>     // Quite variable ~100ms-150ms
>
>     MATCH (c:Customer)-[]->(p:Product)
>     WHERE c.customerId = 7593729
>     RETURN c,p;
>     // Quite variable ~150ms-200ms
>
>
> Java API:
> // Get customer node
> Total time: 00h 00m 00s 118ms
> Total time: 00h 00m 00s 02ms
> Total time: 00h 00m 00s 01ms
> Total time: 00h 00m 00s 01ms
>
> // Get customer's purchased product nodes
> Total time: 00h 00m 00s 191ms
> Total time: 00h 00m 00s 09ms
> Total time: 00h 00m 00s 06ms
> Total time: 00h 00m 00s 08ms
>
> Just a couple of questions from this;
>
> 1) I can't seem to find anything in either the documentation or the Java 
> API javadoc about adding indexes to the relationships in this way. All I've 
> found are notes about making indexes on *Node Labels*. Is this something 
> that isn't currently available? Or have I overlooked something again?
>
>
> Not available and not planned, what is your use-case for those?
>
>
> 2) I wanted to check if the indexes were indeed created, since I couldn't 
> find a Cypher query to list the indexes I tried the neo4j-sh command index 
> --indexes but there were no listed node indexes. Is this because the 
> indexes were created differently and are managed differently now? If so, is 
> the Java API the only way currently to check the indexes?
>
>
> "index --indexes" is for the legacy indexes.
> The command is "schema" in the shell, ":schema" in the browser and 
> db.schema().... for embedded.
>
>
> Btw. the first query is slower as the data has to be loaded from disk 
> first.
>
> Cheers
>
> Michael
>
>
>
> Many thanks,
> V
>
>
>
> On Monday, February 10, 2014 9:13:50 AM UTC, Michael Hunger wrote:
>>
>> I think you ran into some misunderstanding of Neo4j indexes. Sorry for 
>> the confusion.
>>
>> What you created were effectively legacy indexes that were how things 
>> were done in 1.9 and before.
>>
>> With Neo4j 2.0 we have label based indexes that work are used differently.
>>
>> So what you can do (using 2.0.1):
>>
>> #1 rebuild your db without the legacy indexing and instead create unique
>>
>> But use this instead:
>> batchInserter.createDeferredSchemaIndex(label).on(property).create();
>> or
>>
>> batchInserter.createDeferredConstraint(label).assertPropertyIsUnique(property).create();
>>
>>
>> #2 keep your db but delete everything under graph.db/index
>>
>> and either create just an index like this (adapt your label and 
>> property-name) in cypher:
>>
>> create index on :Customer(id)
>>
>> or even a unique constraint (for unique identifiers)
>>
>> create constraint on (c:Customer) assert c.id is unique
>>
>> the transactional Java API is: 
>>
>>                 
>> db.schema().indexFor(DynamicLabel.label(label)).on(property).create();
>> or
>>
>> db.schema().constraintFor(DynamicLabel.label(label)).assertPropertyIsUnique(property).create();
>>
>> Am 09.02.2014 um 22:57 schrieb V <[email protected]>:
>>
>> Hi,
>>
>> I've spent a few hours today looking at the Neo4J docs and playing 
>> around. I started to do something serious for evaluation and I'm a bit 
>> frustrated with myself.
>>
>> Using the BatchInserterIndex I have created a graph with:
>>
>> 1,097,874 million nodes
>> 1,097,874 million properties
>> 8,104,479 million relationships
>>
>> The database size is 829 MB on disk.
>> The indexes directory size is 515 MB. (du -ch data/graph.db/index | grep 
>> total)
>>
>> The graph has two node types Customers and Products, the only property on 
>> these nodes is an ID used to identify the entity in another datastore, and 
>> a single relationship type of Purchased.
>>
>> I have created indexes using the BatchInserterIndexProvider class. If 
>> required I can post my full source code but essentially this is the 
>> importer code:
>>
>> // Create the db and indexes
>> BatchInserter inserter = BatchInserters.inserter("target/graph.db");
>> BatchInserterIndexProvider indexProvider = new 
>> LuceneBatchInserterIndexProvider(inserter);
>> BatchInserterIndex customersIndex = 
>> indexProvider.nodeIndex("customersIdx", MapUtil.stringMap("type", "exact"));
>> customersIndex.setCacheCapacity("customerId", 100000);
>> // Indexes for Product nodes and Purchased Relationship created in the 
>> same way
>>
>> // Create and add node to index
>> long cId = inserter.createNode(customerProperties, customerLabel);
>> customersIndex.add(nodeId, customerProperties);
>>
>> long pId = inserter.createNode(productProperties, productLabel);
>> productsIndex.add(nodeId, productProperties);
>>
>> long purchRelId = inserter.createRelationship(cId, pId, PURCHASED, null);
>> purchasesIndex.add(purchRelId, EMPTY_MAP);
>>
>> // Flush indexes and shutdown batch inserter
>> customersIndex.flush();
>> productsIndex.flush();
>> purchasesIndex.flush();
>> indexProvider.shutdown();
>> inserter.shutdown();
>>
>>
>> Once the batch indexer completes I copy the files to the real location of 
>> the database and start the Neo4J server.
>>
>>
>> *Attempt 1 with Cypher*
>>
>> When I run a cypher query such as:
>>
>>     MATCH (c:Customer)
>>     WHERE c.customerId = 7593729
>>     RETURN c;
>>
>>
>> The response returns in around 8 seconds the first time, and then around 
>> 900 ms the following times.
>>
>> So, I thought perhaps it was just Cyhper, since I read that the Cypher 
>> queries could be slow I tried with the Java API.
>>
>>
>>
>>
>> *Attempt 2 with JAVA API*
>>
>> This is how I did the same query via the Java API:
>>
>>     DateTime startTime = new DateTime();
>>
>>     Transaction tx = graphDb.beginTx();
>>     ResourceIterator<Node> nodes = 
>> graphDb.findNodesByLabelAndProperty(customerLabel, "customerId", 7593729
>> ).iterator();
>>
>>     DateTime finishTime = new DateTime();
>>
>>     while(nodes.hasNext()) {
>>         Node node = nodes.next();
>>         System.out.println(node.getProperty("customerId"));
>>     }
>>
>>     Period period = new Period(startTime, finishTime);
>>     System.out.println("Total time: " + HHMMSSFormater.print(period));
>>
>>
>> The query was executed 4 times in a row and this is the result:
>>
>> Total time: 00h 00m 00s 355
>> Total time: 00h 00m 00s 55
>> Total time: 00h 00m 00s 04
>> Total time: 00h 00m 00s 04
>>
>> Awesome! BUT...
>>
>> If I change the code slightly, and put the finish time after the while 
>> loop and run the same test the result is:
>>
>> Total time: 00h 00m 0*6s* 494
>> Total time: 00h 00m 00s 416
>> Total time: 00h 00m 00s 294
>> Total time: 00h 00m 00s 302
>>
>>
>> So it looks like iterating over the nodes took 6 seconds the first time, 
>> this seems like a long time given that there's only a single Node in the 
>> query result.
>>
>>
>> *Questions*
>>
>> 1. Why are my Cypher and Java queries slow?
>> 2. Have I messed up and not understood how indexing works or is this 
>> normal and expected?
>> 3. How can I make the queries/result reading faster?
>>
>>
>> Many thanks for any replies.
>>
>>
>>
>>
>>
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "Neo4j" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected].
>> For more options, visit https://groups.google.com/groups/opt_out.
>>
>>
>>
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Neo4j" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected] <javascript:>.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.

Re: [Neo4j] Trying to understanding query speed

Reply via email to