Actually MATCH should work as well, could your raise a github issue about it? http://github.com/neo4j/neo4j/issues
The difference is: MERGE is a get-or-create whereas MATCH is a lookup only. So MERGE does more, but at least it uses the index/constraint as it should. See the cypher docs: http://docs.neo4j.org/chunked/milestone/cypher-query-lang.html Cheers, Michael On Thu, Mar 27, 2014 at 3:11 PM, Michel Ávila < [email protected]> wrote: > Yes Michael, that definitely did the trick! Works like a charm now. > Can you explain the differences between MERGE and MATCH, in this case, so > i can choose between them consciously next time? > > Thank you again! > > Em quinta-feira, 27 de março de 2014 10h11min00s UTC-3, Michael Hunger > escreveu: >> >> Can you try to use MERGE instead of MATCH in your relationship-statement >> that should definitely use the index. >> >> >> On Wed, Mar 26, 2014 at 10:13 PM, Michel Ávila >> <[email protected]>wrote: >> >>> I have 3 files, containing a set of companies, persons and the >>> relationships between these entities, respectively. >>> I managed to load the companies and the persons files in no time, but >>> and i'm having some performance issues when loading the last one (the >>> relationships). >>> It took more than 1 hour and i killed it, because i knew something was >>> not right. >>> This sample has following: >>> >>> - ~100k companies; >>> - ~100k persons; >>> - ~250k relationships; >>> >>> I needed to be sure that the file was being read correctly, so i left >>> only one data row in the "rels" file and ran the following cypher: >>> >>> LOAD CSV WITH HEADERS FROM "file:D:\\rels.csv" AS f MATCH (c:company >>> {document >>> : f.company_document } ) RETURN c >>> >>> The result took about 20 seconds to bring me back the company, so it was >>> not a problem reading the file, but finding the company. >>> Then i asked the prompt to profile the cypher, and the result was: >>> >>> ColumnFilter(symKeys=["f", "c"], returnItemNames=["c"], _rows=1,_db_hits >>> =0) >>> Filter(pred="Property(c,document(3)) == Property(f,company_document)",_rows >>> =1, _db_hits=112865) >>> NodeByLabel(identifier="c", _db_hits=0, _rows=112865, >>> label="company",identifiers >>> =["c"], producer="NodeByLabel") >>> LoadCSV(_rows=1, _db_hits=0) >>> >>> The way i see it, the loader is reading the entire node set under the >>> label "company" and applying the document filter later. >>> When i make the same "MATCH" cypher outside the "LOAD" command, the >>> profile is this: >>> >>> profile MATCH (c:company { document: "76875897000169" } ) RETURN c; >>> SchemaIndex(identifier="c", _db_hits=0, _rows=1, label="company", query= >>> "Literal(76875897000169)", identifiers=["c"], property="document",producer >>> ="SchemaIndex") >>> >>> It's clear to me that it's querying the "company" label index as it was >>> designed to do. >>> So, why the "LOAD CSV" uses another query plan to do the same lookup? >>> >>> Thanks in advance! >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "Neo4j" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> >>> For more options, visit https://groups.google.com/d/optout. >>> >> >> -- > You received this message because you are subscribed to the Google Groups > "Neo4j" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "Neo4j" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
