The link: https://tbgraph.wordpress.com/2018/09/09/article-recommendation-system-on-a-citation-network-using-personalized-pagerank-and-neo4j/
has some good info on working with NLP graphs On Wed, Oct 10, 2018 at 2:41 PM John Carlo <johncarlof1...@gmail.com> wrote: > You could start with the 20 newsgroups dataset > http://qwone.com/~jason/20Newsgroups/ > > Il giorno mercoledì 10 ottobre 2018 17:42:37 UTC+2, Sakshi Srivastva ha > scritto: >> >> Sir, i am in search of a data set in which i can find hidden facts like >> panama leak ,please suggest me similar big data set . >> >> On Wed, Oct 10, 2018 at 7:34 PM John Carlo <johncar...@gmail.com> wrote: >> >>> Hello Michael, >>> >>> thank your for your reply. >>> >>> I've re-implemented the db structure using unique words/nodes, now the >>> number of nodes dropped from 47.108.544 to 1.934.049 >>> >>> I still have a huge number of relationships, 45.442.034 that now point >>> to the unique nodes, and the query are slow. >>> >>> My end goal is to find specific patterns in sentence structures, like >>> the following example >>> >>> (John)-[ACTION ]->(eat)-[SUBJECT]->(apple) >>> >>> Any suggestion will be appreciated >>> >>> thank you very much >>> >>> Il giorno mercoledì 10 ottobre 2018 00:50:22 UTC+2, Michael Hunger ha >>> scritto: >>>> >>>> Yes, I would only create every word node once. And then link the >>>> sentence structures. >>>> In general, just finding all the word nodes is probably not your >>>> end-goal or? >>>> >>>> Best ask here Community Site & Forum <https://community.neo4j.com> in >>>> the Modeling and Cypher categories. >>>> >>>> >>>> On Tue, Oct 9, 2018 at 11:00 PM John Carlo <johncar...@gmail.com> >>>> wrote: >>>> >>>>> Hello all, >>>>> >>>>> I've been using Neo4j for some weeks and I think it's awesome. >>>>> >>>>> I'm building an NLP application, and basically, I'm using Neo4j for >>>>> storing the dependency graph generated by a semantic parser, something >>>>> like >>>>> this: >>>>> >>>>> https://explosion.ai/demos/displacy?text=Hi%20dear%2C%20what%20is%20your%20name%3F&model=en_core_web_sm&cpu=1&cph=0 >>>>> >>>>> In the nodes, I store the single words contained in the sentences, and >>>>> I connect them through relations with a number of different types. >>>>> >>>>> For my application, I have the requirement to find all the nodes that >>>>> contain a given word, so basically I have to search through all the nodes, >>>>> finding those that contain the input word. Of course, I've already >>>>> created >>>>> an index on the word text field. >>>>> >>>>> I'm working on a very big dataset (by the way, the CSV importer is a >>>>> great thing). >>>>> >>>>> On my laptop, the following query takes about 20 ms >>>>> *MATCH (t:token) WHERE t.text="avoid" RETURN t.text* >>>>> >>>>> Here are the details of the graph.db: >>>>> 47.108.544 nodes >>>>> >>>>> *45.442.034 relationships* >>>>> >>>>> *13.39 GiB db size* >>>>> *Index created on token.text field* >>>>> >>>>> PROFILE MATCH (t:token) WHERE t.text="switch" RETURN t.text >>>>> ------------------------ >>>>> NodeIndexSeek >>>>> 251,679 db hits >>>>> --------------- >>>>> Projection >>>>> 251,678 db hits >>>>> -------------- >>>>> ProduceResults >>>>> 251,678 db hits >>>>> >>>>> I wonder if I'm doing something wrong in indexing such amount of >>>>> nodes. At the moment, I create a new node for each word I encounter in the >>>>> text, even if the text is the same of other nodes. >>>>> >>>>> Should I create a new node only when a new word is encountered, >>>>> managing the sentence structures through relationships? >>>>> >>>>> Could you please help me with a suggestion or best practice to adopt >>>>> for this specific case? I think that Neo4j is a great piece of software >>>>> and >>>>> I'd like to make the most out of it :-) >>>>> >>>>> Thank you very much >>>>> >>>>> -- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "Neo4j" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an email to neo4j+un...@googlegroups.com. >>>>> For more options, visit https://groups.google.com/d/optout. >>>>> >>>> -- >>> You received this message because you are subscribed to the Google >>> Groups "Neo4j" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to neo4j+un...@googlegroups.com. >>> For more options, visit https://groups.google.com/d/optout. >>> >> -- > You received this message because you are subscribed to the Google Groups > "Neo4j" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to neo4j+unsubscr...@googlegroups.com. > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "Neo4j" group. To unsubscribe from this group and stop receiving emails from it, send an email to neo4j+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.