Re: [Neo4j] Neo4j performace

Michael Hunger Sun, 21 Jun 2015 12:27:06 -0700

0. I would recommend to use neo4j-shell instead of the web-ui
1. you should change your index on :URL(url_original) to a constraint:
2. if you have really long URLs the index lookup might not be as performant
as possible, perhaps there is a shorter unique id
3. the create unique will slow down esp. if you have really many
relationships on a single page, esp. with the additional property checks
-> suggestions:
- change it to create
- ignore the properties and only look at the rel-type
- use shortest path to check for existing rels instead


You can also send me your csv files privately to have a look

create constraint on (u:URL) assert u.url_original is unique;

Constraints added: 1


 explain LOAD CSV WITH HEADERS FROM "file:/home/test/nodes.csv" AS csv

> MERGE (m:URL {url_original: csv.url_original})

> ON CREATE set m.scheme=csv.scheme, m.netloc=csv.netloc, m.path=csv.path,
m.ext=csv.ext

> ;


Compiler CYPHER 2.2

Planner RULE


EmptyResult

  |

  +UpdateGraph

    |

    +LoadCSV


+-------------+-------------+-------------------------------------------------+

|    Operator | Identifiers |
Other |

+-------------+-------------+-------------------------------------------------+

| EmptyResult |             |
  |

| UpdateGraph |      csv, m | MergeNode; csv.url_original;
:URL(url_original) |

|     LoadCSV |         csv |
  |

+-------------+-------------+-------------------------------------------------+



explain LOAD CSV WITH HEADERS FROM "file:/home/test/relations.csv" AS csv

> MATCH (s:URL {url_original: csv.source})

> MATCH (t:URL {url_original: csv.dest})

> CREATE UNIQUE (s)-[r:VISITED {no_requests:toInt(csv.no_requests),
response_code:toInt(csv.response_code), origin:csv.origin}]->(t);


Compiler CYPHER 2.2

Planner RULE


EmptyResult

  |

  +UpdateGraph

    |

    +SchemaIndex(0)

      |

      +SchemaIndex(1)

        |

        +LoadCSV


+----------------+--------------+--------------------------------+

|       Operator |  Identifiers |                          Other |

+----------------+--------------+--------------------------------+

|    EmptyResult |              |                                |

|    UpdateGraph | csv, r, s, t |                   CreateUnique |

| SchemaIndex(0) |    csv, s, t |   csv.dest; :URL(url_original) |

| SchemaIndex(1) |       csv, s | csv.source; :URL(url_original) |

|        LoadCSV |          csv |                                |

+----------------+--------------+--------------------------------+


On Sun, Jun 21, 2015 at 7:08 PM, Ibrahim El-sayed <[email protected]>
wrote:

> I don't have access to the server now but here are almost everything you
> need
> OS: ubuntu 14.04 server
> RAM: 16 GB
> DISK: SSD 1TB
> quad core CPU 3.0
>
> I have *two *CSV files. One file to create the nodes and the other to
> create the relationships
> The one I used to create the nodes is similar too the following
>
> url_original,scheme,ext,path,netloc
> http://www.test.com/test.php?id=1, http, php, test.php, www.test.php
> http://www.test2.com/test2.php?id=1, http, php, test2.php, www.test2.php
> http://www.test3.com/test3.php?id=1, http, php, test3.php, www.test3.php
> http://www.test4.com/test4.php?id=1, http, php, test4.php, www.test4.php
> http://www.test5.com/test5.php?id=1, http, php, test5.php, www.test5.php
>
>
>
> and my query to insert the nodes into the graph is
> USING PERIODIC COMMIT
> LOAD CSV WITH HEADERS FROM "file:/home/test/nodes.csv" AS csv
> MERGE (m:URL {url_original: csv.url_original})
> ON CREATE set m.scheme=csv.scheme, m.netloc=csv.netloc, m.path=csv.path,
> m.ext=csv.ext
>
>
> The other CSV file contains the relations. it looks similar to the
> following file
>
> source,no_requests,response_code,origin,target
> http://www.test.com/index.php,1,200,embedded,http://www.test.com/style.css
> http://www.test.com/index.php,1,200,embedded,http://www.test.com/logo.jpg
> http://www.test.com/index.php,1,200,embedded,http://www.test.com/arrow.jpw
> http://www.test.com/index.php,1,200,embedded,http://www.test.com/jquery.js
>
>
> and the query I use is the following
>
> USING PERIODIC COMMIT
> LOAD CSV WITH HEADERS FROM "file:/home/test/relations.csv" AS csv
> MATCH (s:URL {url_original: csv.source})
> MATCH (t:URL {url_original: csv.dest})
> CREATE UNIQUE (s)-[r:VISITED {no_requests:toInt(csv.no_requests),
> response_code=toInt(csv.response_code), origin=csv.origin}]->(t)
>
>
>
> I have an index on URL:url_original
>
> As I told I don't have access to the server now so I wont be able to
> provide you with messages.log but I will do that ASAP.
>
>
>
> On Sunday, June 21, 2015 at 6:33:38 PM UTC+3, Michael Hunger wrote:
>>
>> Please share the structure of your csv, your query, your configuration
>> (OS, RAM, DISK etc) and your graph.db/messages.log
>> And how you run your query.
>>
>> On Sun, Jun 21, 2015 at 3:24 PM, Ibrahim El-sayed <[email protected]>
>> wrote:
>>
>>> I have a large CSV file that I want to insert into neo4j
>>> I use the periodic commit method to commit from my CSV to the server
>>> since this supposed to be the ideal case to deal with big data.
>>> I have created small test CSV files around 7Mb each one. I tried to do
>>> periodic commit on these files however when I send the query from the neo4j
>>> web interface it keep showing "processing" and it never returns !!! ?? what
>>> might be the problem ?? or how to make sure that my data has been
>>> processed.
>>>
>>>  I also would like to know the fastest way to insert and query data from
>>> neo4j given that my data set is large. I would like to insert around
>>> 5Million nodes and 5 millions relations. I don't see it feasible with the
>>> current performance though !!
>>>
>>> Regards
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "Neo4j" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>  --
> You received this message because you are subscribed to the Google Groups
> "Neo4j" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: [Neo4j] Neo4j performace

Reply via email to