Re: [Neo4j] best practices for storing 40 millions of nodes

John Carlo Wed, 10 Oct 2018 06:46:28 -0700

Hello Michael, 

thank your for your reply.


I've re-implemented the db structure using unique words/nodes, now the 
number of nodes dropped from 47.108.544 to 1.934.049

I still have a huge number of relationships, 45.442.034 that now point to 
the unique nodes, and the query are slow.

My end goal is to find specific patterns in sentence structures, like the 
following example

(John)-[ACTION ]->(eat)-[SUBJECT]->(apple)

Any suggestion will be appreciated

thank you very much



Il giorno mercoledì 10 ottobre 2018 00:50:22 UTC+2, Michael Hunger ha 
scritto:
>
> Yes, I would only create every word node once. And then link the sentence 
> structures.
> In general, just finding all the word nodes is probably not your end-goal 
> or?
>
> Best ask here Community Site & Forum <https://community.neo4j.com> in the 
> Modeling and Cypher categories.
>
>
> On Tue, Oct 9, 2018 at 11:00 PM John Carlo <johncar...@gmail.com 
> <javascript:>> wrote:
>
>> Hello all, 
>>
>> I've been using Neo4j for some weeks and I think it's awesome. 
>>
>> I'm building an NLP application, and basically, I'm using Neo4j for 
>> storing the dependency graph generated by a semantic parser, something like 
>> this:
>>
>> https://explosion.ai/demos/displacy?text=Hi%20dear%2C%20what%20is%20your%20name%3F&model=en_core_web_sm&cpu=1&cph=0
>>
>> In the nodes, I store the single words contained in the sentences, and I 
>> connect them through relations with a number of different types.
>>
>> For my application, I have the requirement to find all the nodes that 
>> contain a given word, so basically I have to search through all the nodes, 
>> finding those that contain the input word.  Of course, I've already created 
>> an index on the word text field.
>>
>> I'm working on a very big dataset (by the way, the CSV importer is a 
>> great thing). 
>>
>> On my laptop, the following query takes about 20 ms
>> *MATCH (t:token) WHERE t.text="avoid" RETURN t.text*
>>
>> Here are the details of the graph.db:
>> 47.108.544 nodes
>>
>> *45.442.034 relationships*
>>
>> *13.39 GiB db size*
>> *Index created on token.text field*
>>
>> PROFILE MATCH (t:token) WHERE t.text="switch" RETURN t.text
>> ------------------------
>> NodeIndexSeek
>> 251,679 db hits
>> ---------------
>> Projection
>> 251,678 db hits
>> --------------
>> ProduceResults
>> 251,678 db hits
>>
>> I wonder if I'm doing something wrong in indexing such amount of nodes. 
>> At the moment, I create a new node for each word I encounter in the text, 
>> even if the text is the same of other nodes.
>>
>> Should I create a new node only when a new word is encountered, managing 
>> the sentence structures through relationships?
>>
>> Could you please help me with a suggestion or best practice to adopt for 
>> this specific case? I think that Neo4j is a great piece of software and I'd 
>> like to make the most out of it :-)
>>
>> Thank you very much 
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "Neo4j" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to neo4j+un...@googlegroups.com <javascript:>.
>> For more options, visit https://groups.google.com/d/optout.
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to neo4j+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [Neo4j] best practices for storing 40 millions of nodes

Reply via email to