Am 10.02.2014 um 16:46 schrieb V <[email protected]>: > Thank you for the response Michael, very helpful. > > I thought it was me doing something wrong :) > > I've tried Option 1, using the Deffered Constraint. Query times in Cypher and > the Java API have improved greatly. > > Results from Option 1: > Cypher: > MATCH (c:Customer) > WHERE c.customerId = 279781 > RETURN c; > // Quite variable ~100ms-150ms > > MATCH (c:Customer)-[]->(p:Product) > WHERE c.customerId = 7593729 > RETURN c,p; > // Quite variable ~150ms-200ms > > > Java API: > // Get customer node > Total time: 00h 00m 00s 118ms > Total time: 00h 00m 00s 02ms > Total time: 00h 00m 00s 01ms > Total time: 00h 00m 00s 01ms > > // Get customer's purchased product nodes > Total time: 00h 00m 00s 191ms > Total time: 00h 00m 00s 09ms > Total time: 00h 00m 00s 06ms > Total time: 00h 00m 00s 08ms > > Just a couple of questions from this; > > 1) I can't seem to find anything in either the documentation or the Java API > javadoc about adding indexes to the relationships in this way. All I've found > are notes about making indexes on Node Labels. Is this something that isn't > currently available? Or have I overlooked something again?
Not available and not planned, what is your use-case for those? > > 2) I wanted to check if the indexes were indeed created, since I couldn't > find a Cypher query to list the indexes I tried the neo4j-sh command index > --indexes but there were no listed node indexes. Is this because the indexes > were created differently and are managed differently now? If so, is the Java > API the only way currently to check the indexes? "index --indexes" is for the legacy indexes. The command is "schema" in the shell, ":schema" in the browser and db.schema().... for embedded. Btw. the first query is slower as the data has to be loaded from disk first. Cheers Michael > > > Many thanks, > V > > > > On Monday, February 10, 2014 9:13:50 AM UTC, Michael Hunger wrote: > I think you ran into some misunderstanding of Neo4j indexes. Sorry for the > confusion. > > What you created were effectively legacy indexes that were how things were > done in 1.9 and before. > > With Neo4j 2.0 we have label based indexes that work are used differently. > > So what you can do (using 2.0.1): > > #1 rebuild your db without the legacy indexing and instead create unique > > But use this instead: > batchInserter.createDeferredSchemaIndex(label).on(property).create(); > or > batchInserter.createDeferredConstraint(label).assertPropertyIsUnique(property).create(); > > > #2 keep your db but delete everything under graph.db/index > > and either create just an index like this (adapt your label and > property-name) in cypher: > > create index on :Customer(id) > > or even a unique constraint (for unique identifiers) > > create constraint on (c:Customer) assert c.id is unique > > the transactional Java API is: > > > db.schema().indexFor(DynamicLabel.label(label)).on(property).create(); > or > > db.schema().constraintFor(DynamicLabel.label(label)).assertPropertyIsUnique(property).create(); > > Am 09.02.2014 um 22:57 schrieb V <[email protected]>: > >> Hi, >> >> I've spent a few hours today looking at the Neo4J docs and playing around. I >> started to do something serious for evaluation and I'm a bit frustrated with >> myself. >> >> Using the BatchInserterIndex I have created a graph with: >> >> 1,097,874 million nodes >> 1,097,874 million properties >> 8,104,479 million relationships >> >> The database size is 829 MB on disk. >> The indexes directory size is 515 MB. (du -ch data/graph.db/index | grep >> total) >> >> The graph has two node types Customers and Products, the only property on >> these nodes is an ID used to identify the entity in another datastore, and a >> single relationship type of Purchased. >> >> I have created indexes using the BatchInserterIndexProvider class. If >> required I can post my full source code but essentially this is the importer >> code: >> >> // Create the db and indexes >> BatchInserter inserter = BatchInserters.inserter("target/graph.db"); >> BatchInserterIndexProvider indexProvider = new >> LuceneBatchInserterIndexProvider(inserter); >> BatchInserterIndex customersIndex = indexProvider.nodeIndex("customersIdx", >> MapUtil.stringMap("type", "exact")); >> customersIndex.setCacheCapacity("customerId", 100000); >> // Indexes for Product nodes and Purchased Relationship created in the same >> way >> >> // Create and add node to index >> long cId = inserter.createNode(customerProperties, customerLabel); >> customersIndex.add(nodeId, customerProperties); >> >> long pId = inserter.createNode(productProperties, productLabel); >> productsIndex.add(nodeId, productProperties); >> >> long purchRelId = inserter.createRelationship(cId, pId, PURCHASED, null); >> purchasesIndex.add(purchRelId, EMPTY_MAP); >> >> // Flush indexes and shutdown batch inserter >> customersIndex.flush(); >> productsIndex.flush(); >> purchasesIndex.flush(); >> indexProvider.shutdown(); >> inserter.shutdown(); >> >> >> Once the batch indexer completes I copy the files to the real location of >> the database and start the Neo4J server. >> >> >> Attempt 1 with Cypher >> >> When I run a cypher query such as: >> >> MATCH (c:Customer) >> WHERE c.customerId = 7593729 >> RETURN c; >> >> >> The response returns in around 8 seconds the first time, and then around 900 >> ms the following times. >> >> So, I thought perhaps it was just Cyhper, since I read that the Cypher >> queries could be slow I tried with the Java API. >> >> >> >> >> Attempt 2 with JAVA API >> >> This is how I did the same query via the Java API: >> >> DateTime startTime = new DateTime(); >> >> Transaction tx = graphDb.beginTx(); >> ResourceIterator<Node> nodes = >> graphDb.findNodesByLabelAndProperty(customerLabel, "customerId", >> 7593729).iterator(); >> >> DateTime finishTime = new DateTime(); >> >> while(nodes.hasNext()) { >> Node node = nodes.next(); >> System.out.println(node.getProperty("customerId")); >> } >> >> Period period = new Period(startTime, finishTime); >> System.out.println("Total time: " + HHMMSSFormater.print(period)); >> >> >> The query was executed 4 times in a row and this is the result: >> >> Total time: 00h 00m 00s 355 >> Total time: 00h 00m 00s 55 >> Total time: 00h 00m 00s 04 >> Total time: 00h 00m 00s 04 >> >> Awesome! BUT... >> >> If I change the code slightly, and put the finish time after the while loop >> and run the same test the result is: >> >> Total time: 00h 00m 06s 494 >> Total time: 00h 00m 00s 416 >> Total time: 00h 00m 00s 294 >> Total time: 00h 00m 00s 302 >> >> >> So it looks like iterating over the nodes took 6 seconds the first time, >> this seems like a long time given that there's only a single Node in the >> query result. >> >> >> Questions >> >> 1. Why are my Cypher and Java queries slow? >> 2. Have I messed up and not understood how indexing works or is this normal >> and expected? >> 3. How can I make the queries/result reading faster? >> >> >> Many thanks for any replies. >> >> >> >> >> >> >> -- >> You received this message because you are subscribed to the Google Groups >> "Neo4j" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> For more options, visit https://groups.google.com/groups/opt_out. > > > -- > You received this message because you are subscribed to the Google Groups > "Neo4j" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > For more options, visit https://groups.google.com/groups/opt_out. -- You received this message because you are subscribed to the Google Groups "Neo4j" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/groups/opt_out.
