Re: [Neo4j] best practices for storing 40 millions of nodes

Sakshi Srivastva Wed, 10 Oct 2018 00:46:54 -0700

CAN YOU PLEASE TELL ME WHICH DATA SET YOU ARE USING.

On Tue, Oct 9, 2018 at 3:50 PM 'Michael Hunger' via Neo4j <
[email protected]> wrote:


> Yes, I would only create every word node once. And then link the sentence
> structures.
> In general, just finding all the word nodes is probably not your end-goal
> or?
>
> Best ask here Community Site & Forum <https://community.neo4j.com> in the
> Modeling and Cypher categories.
>
>
> On Tue, Oct 9, 2018 at 11:00 PM John Carlo <[email protected]>
> wrote:
>
>> Hello all,
>>
>> I've been using Neo4j for some weeks and I think it's awesome.
>>
>> I'm building an NLP application, and basically, I'm using Neo4j for
>> storing the dependency graph generated by a semantic parser, something like
>> this:
>>
>> https://explosion.ai/demos/displacy?text=Hi%20dear%2C%20what%20is%20your%20name%3F&model=en_core_web_sm&cpu=1&cph=0
>>
>> In the nodes, I store the single words contained in the sentences, and I
>> connect them through relations with a number of different types.
>>
>> For my application, I have the requirement to find all the nodes that
>> contain a given word, so basically I have to search through all the nodes,
>> finding those that contain the input word.  Of course, I've already created
>> an index on the word text field.
>>
>> I'm working on a very big dataset (by the way, the CSV importer is a
>> great thing).
>>
>> On my laptop, the following query takes about 20 ms
>> *MATCH (t:token) WHERE t.text="avoid" RETURN t.text*
>>
>> Here are the details of the graph.db:
>> 47.108.544 nodes
>>
>> *45.442.034 relationships*
>>
>> *13.39 GiB db size*
>> *Index created on token.text field*
>>
>> PROFILE MATCH (t:token) WHERE t.text="switch" RETURN t.text
>> ------------------------
>> NodeIndexSeek
>> 251,679 db hits
>> ---------------
>> Projection
>> 251,678 db hits
>> --------------
>> ProduceResults
>> 251,678 db hits
>>
>> I wonder if I'm doing something wrong in indexing such amount of nodes.
>> At the moment, I create a new node for each word I encounter in the text,
>> even if the text is the same of other nodes.
>>
>> Should I create a new node only when a new word is encountered, managing
>> the sentence structures through relationships?
>>
>> Could you please help me with a suggestion or best practice to adopt for
>> this specific case? I think that Neo4j is a great piece of software and I'd
>> like to make the most out of it :-)
>>
>> Thank you very much
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "Neo4j" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> For more options, visit https://groups.google.com/d/optout.
>>
> --
> You received this message because you are subscribed to the Google Groups
> "Neo4j" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: [Neo4j] best practices for storing 40 millions of nodes

Reply via email to