[Neo4j] Version 2.1.0-M01 CSV Import Index Lookup

Michel Ávila Wed, 26 Mar 2014 14:14:22 -0700

I have 3 files, containing a set of companies, persons and the 
relationships between these entities, respectively.
I managed to load the companies and the persons files in no time, but and 
i'm having some performance issues when loading the last one (the 
relationships).
It took more than 1 hour and i killed it, because i knew something was not 
right.
This sample has following:


   - ~100k companies;
   - ~100k persons;
   - ~250k relationships;

I needed to be sure that the file was being read correctly, so i left only 
one data row in the "rels" file and ran the following cypher:

LOAD CSV WITH HEADERS FROM "file:D:\\rels.csv" AS f MATCH (c:company {document
: f.company_document } ) RETURN c

The result took about 20 seconds to bring me back the company, so it was 
not a problem reading the file, but finding the company.
Then i asked the prompt to profile the cypher, and the result was:

ColumnFilter(symKeys=["f", "c"], returnItemNames=["c"], _rows=1, _db_hits=0)
Filter(pred="Property(c,document(3)) == Property(f,company_document)", _rows
=1, _db_hits=112865)
  NodeByLabel(identifier="c", _db_hits=0, _rows=112865, 
label="company",identifiers
=["c"], producer="NodeByLabel")
    LoadCSV(_rows=1, _db_hits=0)

The way i see it, the loader is reading the entire node set under the label 
"company" and applying the document filter later.
When i make the same "MATCH" cypher outside the "LOAD" command, the profile 
is this:

profile MATCH (c:company { document: "76875897000169" } ) RETURN c;
SchemaIndex(identifier="c", _db_hits=0, _rows=1, label="company", query=
"Literal(76875897000169)", identifiers=["c"], property="document", producer=
"SchemaIndex")

It's clear to me that it's querying the "company" label index as it was 
designed to do.
So, why the "LOAD CSV" uses another query plan to do the same lookup?

Thanks in advance!

-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

[Neo4j] Version 2.1.0-M01 CSV Import Index Lookup

Reply via email to