Re: [Neo4j] Neo4j performace

Ibrahim El-sayed Sun, 21 Jun 2015 12:42:06 -0700

I will send you a sample tomorrow of my CSV

On Sunday, June 21, 2015 at 10:26:42 PM UTC+3, Michael Hunger wrote:
>
> 0. I would recommend to use neo4j-shell instead of the web-ui
> 1. you should change your index on :URL(url_original) to a constraint:
> 2. if you have really long URLs the index lookup might not be as 
> performant as possible, perhaps there is a shorter unique id
> 3. the create unique will slow down esp. if you have really many 
> relationships on a single page, esp. with the additional property checks
> -> suggestions: 
> - change it to create
> - ignore the properties and only look at the rel-type
> - use shortest path to check for existing rels instead
>
> You can also send me your csv files privately to have a look
>
> create constraint on (u:URL) assert u.url_original is unique;
>
> Constraints added: 1
>
>
>  explain LOAD CSV WITH HEADERS FROM "file:/home/test/nodes.csv" AS csv
>
> > MERGE (m:URL {url_original: csv.url_original})
>
> > ON CREATE set m.scheme=csv.scheme, m.netloc=csv.netloc, m.path=csv.path, 
> m.ext=csv.ext
>
> > ;
>
>
> Compiler CYPHER 2.2
>
> Planner RULE
>
>
> EmptyResult
>
>   |
>
>   +UpdateGraph
>
>     |
>
>     +LoadCSV
>
>
>
> +-------------+-------------+-------------------------------------------------+
>
> |    Operator | Identifiers |                                           
> Other |
>
>
> +-------------+-------------+-------------------------------------------------+
>
> | EmptyResult |             |                                             
>     |
>
> | UpdateGraph |      csv, m | MergeNode; csv.url_original; 
> :URL(url_original) |
>
> |     LoadCSV |         csv |                                             
>     |
>
>
> +-------------+-------------+-------------------------------------------------+
>
>
>
> explain LOAD CSV WITH HEADERS FROM "file:/home/test/relations.csv" AS csv
>
> > MATCH (s:URL {url_original: csv.source})
>
> > MATCH (t:URL {url_original: csv.dest})
>
> > CREATE UNIQUE (s)-[r:VISITED {no_requests:toInt(csv.no_requests), 
> response_code:toInt(csv.response_code), origin:csv.origin}]->(t);
>
>
> Compiler CYPHER 2.2
>
> Planner RULE
>
>
> EmptyResult
>
>   |
>
>   +UpdateGraph
>
>     |
>
>     +SchemaIndex(0)
>
>       |
>
>       +SchemaIndex(1)
>
>         |
>
>         +LoadCSV
>
>
> +----------------+--------------+--------------------------------+
>
> |       Operator |  Identifiers |                          Other |
>
> +----------------+--------------+--------------------------------+
>
> |    EmptyResult |              |                                |
>
> |    UpdateGraph | csv, r, s, t |                   CreateUnique |
>
> | SchemaIndex(0) |    csv, s, t |   csv.dest; :URL(url_original) |
>
> | SchemaIndex(1) |       csv, s | csv.source; :URL(url_original) |
>
> |        LoadCSV |          csv |                                |
>
> +----------------+--------------+--------------------------------+
>
>
> On Sun, Jun 21, 2015 at 7:08 PM, Ibrahim El-sayed <[email protected] 
> <javascript:>> wrote:
>
>> I don't have access to the server now but here are almost everything you 
>> need
>> OS: ubuntu 14.04 server
>> RAM: 16 GB
>> DISK: SSD 1TB
>> quad core CPU 3.0
>>
>> I have *two *CSV files. One file to create the nodes and the other to 
>> create the relationships
>> The one I used to create the nodes is similar too the following
>>
>> url_original,scheme,ext,path,netloc
>> http://www.test.com/test.php?id=1, http, php, test.php, www.test.php
>> http://www.test2.com/test2.php?id=1, http, php, test2.php, www.test2.php
>> http://www.test3.com/test3.php?id=1, http, php, test3.php, www.test3.php
>> http://www.test4.com/test4.php?id=1, http, php, test4.php, www.test4.php
>> http://www.test5.com/test5.php?id=1, http, php, test5.php, www.test5.php
>>
>>
>>
>> and my query to insert the nodes into the graph is
>> USING PERIODIC COMMIT
>> LOAD CSV WITH HEADERS FROM "file:/home/test/nodes.csv" AS csv
>> MERGE (m:URL {url_original: csv.url_original})
>> ON CREATE set m.scheme=csv.scheme, m.netloc=csv.netloc, m.path=csv.path, 
>> m.ext=csv.ext
>>
>>
>> The other CSV file contains the relations. it looks similar to the 
>> following file
>>
>> source,no_requests,response_code,origin,target
>> http://www.test.com/index.php,1,200,embedded,http://www.test.com/style.css
>> http://www.test.com/index.php,1,200,embedded,http://www.test.com/logo.jpg
>> http://www.test.com/index.php,1,200,embedded,http://www.test.com/arrow.jpw
>> http://www.test.com/index.php,1,200,embedded,http://www.test.com/jquery.js
>>
>>
>> and the query I use is the following
>>
>> USING PERIODIC COMMIT
>> LOAD CSV WITH HEADERS FROM "file:/home/test/relations.csv" AS csv
>> MATCH (s:URL {url_original: csv.source})
>> MATCH (t:URL {url_original: csv.dest})
>> CREATE UNIQUE (s)-[r:VISITED {no_requests:toInt(csv.no_requests), 
>> response_code=toInt(csv.response_code), origin=csv.origin}]->(t)
>>
>>
>>
>> I have an index on URL:url_original
>>
>> As I told I don't have access to the server now so I wont be able to 
>> provide you with messages.log but I will do that ASAP.
>>
>>
>>
>> On Sunday, June 21, 2015 at 6:33:38 PM UTC+3, Michael Hunger wrote:
>>>
>>> Please share the structure of your csv, your query, your configuration 
>>> (OS, RAM, DISK etc) and your graph.db/messages.log
>>> And how you run your query.
>>>
>>> On Sun, Jun 21, 2015 at 3:24 PM, Ibrahim El-sayed <[email protected]
>>> > wrote:
>>>
>>>> I have a large CSV file that I want to insert into neo4j 
>>>> I use the periodic commit method to commit from my CSV to the server 
>>>> since this supposed to be the ideal case to deal with big data. 
>>>> I have created small test CSV files around 7Mb each one. I tried to do 
>>>> periodic commit on these files however when I send the query from the 
>>>> neo4j 
>>>> web interface it keep showing "processing" and it never returns !!! ?? 
>>>> what 
>>>> might be the problem ?? or how to make sure that my data has been 
>>>> processed. 
>>>>
>>>>  I also would like to know the fastest way to insert and query data 
>>>> from neo4j given that my data set is large. I would like to insert around 
>>>> 5Million nodes and 5 millions relations. I don't see it feasible with the 
>>>> current performance though !! 
>>>>
>>>> Regards
>>>>
>>>> -- 
>>>> You received this message because you are subscribed to the Google 
>>>> Groups "Neo4j" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>> an email to [email protected].
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>
>>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "Neo4j" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>


-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: [Neo4j] Neo4j performace

Reply via email to