I will send you a sample tomorrow of my CSV
On Sunday, June 21, 2015 at 10:26:42 PM UTC+3, Michael Hunger wrote:
>
> 0. I would recommend to use neo4j-shell instead of the web-ui
> 1. you should change your index on :URL(url_original) to a constraint:
> 2. if you have really long URLs the index lookup might not be as
> performant as possible, perhaps there is a shorter unique id
> 3. the create unique will slow down esp. if you have really many
> relationships on a single page, esp. with the additional property checks
> -> suggestions:
> - change it to create
> - ignore the properties and only look at the rel-type
> - use shortest path to check for existing rels instead
>
> You can also send me your csv files privately to have a look
>
> create constraint on (u:URL) assert u.url_original is unique;
>
> Constraints added: 1
>
>
> explain LOAD CSV WITH HEADERS FROM "file:/home/test/nodes.csv" AS csv
>
> > MERGE (m:URL {url_original: csv.url_original})
>
> > ON CREATE set m.scheme=csv.scheme, m.netloc=csv.netloc, m.path=csv.path,
> m.ext=csv.ext
>
> > ;
>
>
> Compiler CYPHER 2.2
>
> Planner RULE
>
>
> EmptyResult
>
> |
>
> +UpdateGraph
>
> |
>
> +LoadCSV
>
>
>
> +-------------+-------------+-------------------------------------------------+
>
> | Operator | Identifiers |
> Other |
>
>
> +-------------+-------------+-------------------------------------------------+
>
> | EmptyResult | |
> |
>
> | UpdateGraph | csv, m | MergeNode; csv.url_original;
> :URL(url_original) |
>
> | LoadCSV | csv |
> |
>
>
> +-------------+-------------+-------------------------------------------------+
>
>
>
> explain LOAD CSV WITH HEADERS FROM "file:/home/test/relations.csv" AS csv
>
> > MATCH (s:URL {url_original: csv.source})
>
> > MATCH (t:URL {url_original: csv.dest})
>
> > CREATE UNIQUE (s)-[r:VISITED {no_requests:toInt(csv.no_requests),
> response_code:toInt(csv.response_code), origin:csv.origin}]->(t);
>
>
> Compiler CYPHER 2.2
>
> Planner RULE
>
>
> EmptyResult
>
> |
>
> +UpdateGraph
>
> |
>
> +SchemaIndex(0)
>
> |
>
> +SchemaIndex(1)
>
> |
>
> +LoadCSV
>
>
> +----------------+--------------+--------------------------------+
>
> | Operator | Identifiers | Other |
>
> +----------------+--------------+--------------------------------+
>
> | EmptyResult | | |
>
> | UpdateGraph | csv, r, s, t | CreateUnique |
>
> | SchemaIndex(0) | csv, s, t | csv.dest; :URL(url_original) |
>
> | SchemaIndex(1) | csv, s | csv.source; :URL(url_original) |
>
> | LoadCSV | csv | |
>
> +----------------+--------------+--------------------------------+
>
>
> On Sun, Jun 21, 2015 at 7:08 PM, Ibrahim El-sayed <[email protected]
> <javascript:>> wrote:
>
>> I don't have access to the server now but here are almost everything you
>> need
>> OS: ubuntu 14.04 server
>> RAM: 16 GB
>> DISK: SSD 1TB
>> quad core CPU 3.0
>>
>> I have *two *CSV files. One file to create the nodes and the other to
>> create the relationships
>> The one I used to create the nodes is similar too the following
>>
>> url_original,scheme,ext,path,netloc
>> http://www.test.com/test.php?id=1, http, php, test.php, www.test.php
>> http://www.test2.com/test2.php?id=1, http, php, test2.php, www.test2.php
>> http://www.test3.com/test3.php?id=1, http, php, test3.php, www.test3.php
>> http://www.test4.com/test4.php?id=1, http, php, test4.php, www.test4.php
>> http://www.test5.com/test5.php?id=1, http, php, test5.php, www.test5.php
>>
>>
>>
>> and my query to insert the nodes into the graph is
>> USING PERIODIC COMMIT
>> LOAD CSV WITH HEADERS FROM "file:/home/test/nodes.csv" AS csv
>> MERGE (m:URL {url_original: csv.url_original})
>> ON CREATE set m.scheme=csv.scheme, m.netloc=csv.netloc, m.path=csv.path,
>> m.ext=csv.ext
>>
>>
>> The other CSV file contains the relations. it looks similar to the
>> following file
>>
>> source,no_requests,response_code,origin,target
>> http://www.test.com/index.php,1,200,embedded,http://www.test.com/style.css
>> http://www.test.com/index.php,1,200,embedded,http://www.test.com/logo.jpg
>> http://www.test.com/index.php,1,200,embedded,http://www.test.com/arrow.jpw
>> http://www.test.com/index.php,1,200,embedded,http://www.test.com/jquery.js
>>
>>
>> and the query I use is the following
>>
>> USING PERIODIC COMMIT
>> LOAD CSV WITH HEADERS FROM "file:/home/test/relations.csv" AS csv
>> MATCH (s:URL {url_original: csv.source})
>> MATCH (t:URL {url_original: csv.dest})
>> CREATE UNIQUE (s)-[r:VISITED {no_requests:toInt(csv.no_requests),
>> response_code=toInt(csv.response_code), origin=csv.origin}]->(t)
>>
>>
>>
>> I have an index on URL:url_original
>>
>> As I told I don't have access to the server now so I wont be able to
>> provide you with messages.log but I will do that ASAP.
>>
>>
>>
>> On Sunday, June 21, 2015 at 6:33:38 PM UTC+3, Michael Hunger wrote:
>>>
>>> Please share the structure of your csv, your query, your configuration
>>> (OS, RAM, DISK etc) and your graph.db/messages.log
>>> And how you run your query.
>>>
>>> On Sun, Jun 21, 2015 at 3:24 PM, Ibrahim El-sayed <[email protected]
>>> > wrote:
>>>
>>>> I have a large CSV file that I want to insert into neo4j
>>>> I use the periodic commit method to commit from my CSV to the server
>>>> since this supposed to be the ideal case to deal with big data.
>>>> I have created small test CSV files around 7Mb each one. I tried to do
>>>> periodic commit on these files however when I send the query from the
>>>> neo4j
>>>> web interface it keep showing "processing" and it never returns !!! ??
>>>> what
>>>> might be the problem ?? or how to make sure that my data has been
>>>> processed.
>>>>
>>>> I also would like to know the fastest way to insert and query data
>>>> from neo4j given that my data set is large. I would like to insert around
>>>> 5Million nodes and 5 millions relations. I don't see it feasible with the
>>>> current performance though !!
>>>>
>>>> Regards
>>>>
>>>> --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "Neo4j" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to [email protected].
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>
>>> --
>> You received this message because you are subscribed to the Google Groups
>> "Neo4j" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected] <javascript:>.
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>
--
You received this message because you are subscribed to the Google Groups
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/d/optout.