Re: [Neo4j] best practices for storing 40 millions of nodes

'Michael Hunger' via Neo4j Wed, 10 Oct 2018 12:48:01 -0700

John, can you post here Community Site & Forum <https://community.neo4j.com>
Easier for me to answer there.



On Wed, Oct 10, 2018 at 3:41 PM John Carlo <johncarlof1...@gmail.com> wrote:

> Hello Michael,
>
> thank your for your reply.
>
> I've re-implemented the db structure using unique words/nodes, now the
> number of nodes dropped from 47.108.544 to 1.934.049
>
> I still have a huge number of relationships, 45.442.034 that now point to
> the unique nodes, and the query are slow.
>
> My end goal is to find specific patterns in sentence structures, like the
> following example
>
> (John)-[ACTION ]->(eat)-[SUBJECT]->(apple)
>
> Any suggestion will be appreciated
>
> thank you very much
>
>
>
> Il giorno mercoledì 10 ottobre 2018 00:50:22 UTC+2, Michael Hunger ha
> scritto:
>>
>> Yes, I would only create every word node once. And then link the sentence
>> structures.
>> In general, just finding all the word nodes is probably not your end-goal
>> or?
>>
>> Best ask here Community Site & Forum <https://community.neo4j.com> in
>> the Modeling and Cypher categories.
>>
>>
>> On Tue, Oct 9, 2018 at 11:00 PM John Carlo <johncar...@gmail.com> wrote:
>>
>>> Hello all,
>>>
>>> I've been using Neo4j for some weeks and I think it's awesome.
>>>
>>> I'm building an NLP application, and basically, I'm using Neo4j for
>>> storing the dependency graph generated by a semantic parser, something like
>>> this:
>>>
>>> https://explosion.ai/demos/displacy?text=Hi%20dear%2C%20what%20is%20your%20name%3F&model=en_core_web_sm&cpu=1&cph=0
>>>
>>> In the nodes, I store the single words contained in the sentences, and I
>>> connect them through relations with a number of different types.
>>>
>>> For my application, I have the requirement to find all the nodes that
>>> contain a given word, so basically I have to search through all the nodes,
>>> finding those that contain the input word.  Of course, I've already created
>>> an index on the word text field.
>>>
>>> I'm working on a very big dataset (by the way, the CSV importer is a
>>> great thing).
>>>
>>> On my laptop, the following query takes about 20 ms
>>> *MATCH (t:token) WHERE t.text="avoid" RETURN t.text*
>>>
>>> Here are the details of the graph.db:
>>> 47.108.544 nodes
>>>
>>> *45.442.034 relationships*
>>>
>>> *13.39 GiB db size*
>>> *Index created on token.text field*
>>>
>>> PROFILE MATCH (t:token) WHERE t.text="switch" RETURN t.text
>>> ------------------------
>>> NodeIndexSeek
>>> 251,679 db hits
>>> ---------------
>>> Projection
>>> 251,678 db hits
>>> --------------
>>> ProduceResults
>>> 251,678 db hits
>>>
>>> I wonder if I'm doing something wrong in indexing such amount of nodes.
>>> At the moment, I create a new node for each word I encounter in the text,
>>> even if the text is the same of other nodes.
>>>
>>> Should I create a new node only when a new word is encountered, managing
>>> the sentence structures through relationships?
>>>
>>> Could you please help me with a suggestion or best practice to adopt for
>>> this specific case? I think that Neo4j is a great piece of software and I'd
>>> like to make the most out of it :-)
>>>
>>> Thank you very much
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "Neo4j" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to neo4j+un...@googlegroups.com.
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>> --
> You received this message because you are subscribed to the Google Groups
> "Neo4j" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to neo4j+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to neo4j+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [Neo4j] best practices for storing 40 millions of nodes

Reply via email to