Re: [Neo4j] Version 2.1.0-M01 CSV Import Index Lookup

Michel Ávila Thu, 27 Mar 2014 11:44:41 -0700

This is why i was using MATCH, instead of MERGE.
Because i created all nodes before, and only then the relationships file
was loaded.
I tested both commands outside the LOAD CSV, and they used the index.
It's only inside the LOAD CSV context that MATCH doesn't behave as it's
supposed to.
Weird stuff indeed.
I'll raise a github issue about it, as you suggested.


Thanks.


2014-03-27 11:31 GMT-03:00 Michael Hunger <[email protected]>
:

> Actually MATCH should work as well, could your raise a github issue about
> it? http://github.com/neo4j/neo4j/issues
>
> The difference is: MERGE is a get-or-create whereas MATCH is a lookup only.
>
> So MERGE does more, but at least it uses the index/constraint as it should.
> See the cypher docs:
> http://docs.neo4j.org/chunked/milestone/cypher-query-lang.html
>
> Cheers,
>
> Michael
>
>
>
> On Thu, Mar 27, 2014 at 3:11 PM, Michel Ávila <
> [email protected]> wrote:
>
>> Yes Michael, that definitely did the trick! Works like a charm now.
>> Can you explain the differences between MERGE and MATCH, in this case, so
>> i can choose between them consciously next time?
>>
>> Thank you again!
>>
>> Em quinta-feira, 27 de março de 2014 10h11min00s UTC-3, Michael Hunger
>> escreveu:
>>>
>>> Can you try to use MERGE instead of MATCH in your relationship-statement
>>> that should definitely use the index.
>>>
>>>
>>> On Wed, Mar 26, 2014 at 10:13 PM, Michel Ávila 
>>> <[email protected]>wrote:
>>>
>>>> I have 3 files, containing a set of companies, persons and the
>>>> relationships between these entities, respectively.
>>>> I managed to load the companies and the persons files in no time, but
>>>> and i'm having some performance issues when loading the last one (the
>>>> relationships).
>>>> It took more than 1 hour and i killed it, because i knew something was
>>>> not right.
>>>> This sample has following:
>>>>
>>>>    - ~100k companies;
>>>>    - ~100k persons;
>>>>    - ~250k relationships;
>>>>
>>>> I needed to be sure that the file was being read correctly, so i left
>>>> only one data row in the "rels" file and ran the following cypher:
>>>>
>>>> LOAD CSV WITH HEADERS FROM "file:D:\\rels.csv" AS f MATCH (c:company 
>>>> {document
>>>> : f.company_document } ) RETURN c
>>>>
>>>> The result took about 20 seconds to bring me back the company, so it
>>>> was not a problem reading the file, but finding the company.
>>>> Then i asked the prompt to profile the cypher, and the result was:
>>>>
>>>> ColumnFilter(symKeys=["f", "c"], returnItemNames=["c"], _rows=1,_db_hits
>>>> =0)
>>>> Filter(pred="Property(c,document(3)) == Property(f,company_document)",_rows
>>>> =1, _db_hits=112865)
>>>>   NodeByLabel(identifier="c", _db_hits=0, _rows=112865, label="company"
>>>> , identifiers=["c"], producer="NodeByLabel")
>>>>     LoadCSV(_rows=1, _db_hits=0)
>>>>
>>>> The way i see it, the loader is reading the entire node set under the
>>>> label "company" and applying the document filter later.
>>>> When i make the same "MATCH" cypher outside the "LOAD" command, the
>>>> profile is this:
>>>>
>>>> profile MATCH (c:company { document: "76875897000169" } ) RETURN c;
>>>> SchemaIndex(identifier="c", _db_hits=0, _rows=1, label="company", query
>>>> ="Literal(76875897000169)", identifiers=["c"], property="document",producer
>>>> ="SchemaIndex")
>>>>
>>>> It's clear to me that it's querying the "company" label index as it was
>>>> designed to do.
>>>> So, why the "LOAD CSV" uses another query plan to do the same lookup?
>>>>
>>>> Thanks in advance!
>>>>
>>>> --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "Neo4j" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to [email protected].
>>>>
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>
>>>  --
>> You received this message because you are subscribed to the Google Groups
>> "Neo4j" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>  --
> You received this message because you are subscribed to a topic in the
> Google Groups "Neo4j" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/neo4j/NLcAOt_orD8/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> [email protected].
> For more options, visit https://groups.google.com/d/optout.
>



-- 
Michel Leite de Ávila

-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: [Neo4j] Version 2.1.0-M01 CSV Import Index Lookup

Reply via email to