Actually MATCH should work as well, could your raise a github issue about
it? http://github.com/neo4j/neo4j/issues

The difference is: MERGE is a get-or-create whereas MATCH is a lookup only.

So MERGE does more, but at least it uses the index/constraint as it should.
See the cypher docs:
http://docs.neo4j.org/chunked/milestone/cypher-query-lang.html

Cheers,

Michael



On Thu, Mar 27, 2014 at 3:11 PM, Michel Ávila <
[email protected]> wrote:

> Yes Michael, that definitely did the trick! Works like a charm now.
> Can you explain the differences between MERGE and MATCH, in this case, so
> i can choose between them consciously next time?
>
> Thank you again!
>
> Em quinta-feira, 27 de março de 2014 10h11min00s UTC-3, Michael Hunger
> escreveu:
>>
>> Can you try to use MERGE instead of MATCH in your relationship-statement
>> that should definitely use the index.
>>
>>
>> On Wed, Mar 26, 2014 at 10:13 PM, Michel Ávila 
>> <[email protected]>wrote:
>>
>>> I have 3 files, containing a set of companies, persons and the
>>> relationships between these entities, respectively.
>>> I managed to load the companies and the persons files in no time, but
>>> and i'm having some performance issues when loading the last one (the
>>> relationships).
>>> It took more than 1 hour and i killed it, because i knew something was
>>> not right.
>>> This sample has following:
>>>
>>>    - ~100k companies;
>>>    - ~100k persons;
>>>    - ~250k relationships;
>>>
>>> I needed to be sure that the file was being read correctly, so i left
>>> only one data row in the "rels" file and ran the following cypher:
>>>
>>> LOAD CSV WITH HEADERS FROM "file:D:\\rels.csv" AS f MATCH (c:company 
>>> {document
>>> : f.company_document } ) RETURN c
>>>
>>> The result took about 20 seconds to bring me back the company, so it was
>>> not a problem reading the file, but finding the company.
>>> Then i asked the prompt to profile the cypher, and the result was:
>>>
>>> ColumnFilter(symKeys=["f", "c"], returnItemNames=["c"], _rows=1,_db_hits
>>> =0)
>>> Filter(pred="Property(c,document(3)) == Property(f,company_document)",_rows
>>> =1, _db_hits=112865)
>>>   NodeByLabel(identifier="c", _db_hits=0, _rows=112865, 
>>> label="company",identifiers
>>> =["c"], producer="NodeByLabel")
>>>     LoadCSV(_rows=1, _db_hits=0)
>>>
>>> The way i see it, the loader is reading the entire node set under the
>>> label "company" and applying the document filter later.
>>> When i make the same "MATCH" cypher outside the "LOAD" command, the
>>> profile is this:
>>>
>>> profile MATCH (c:company { document: "76875897000169" } ) RETURN c;
>>> SchemaIndex(identifier="c", _db_hits=0, _rows=1, label="company", query=
>>> "Literal(76875897000169)", identifiers=["c"], property="document",producer
>>> ="SchemaIndex")
>>>
>>> It's clear to me that it's querying the "company" label index as it was
>>> designed to do.
>>> So, why the "LOAD CSV" uses another query plan to do the same lookup?
>>>
>>> Thanks in advance!
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "Neo4j" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>>
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>  --
> You received this message because you are subscribed to the Google Groups
> "Neo4j" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to