It's a custom dataset generated from a series of documents in XML format, translated in CSV and the imported in Neo4j via the built-in CSV loader
Il giorno mercoledì 10 ottobre 2018 09:46:39 UTC+2, Sakshi Srivastva ha scritto: > > CAN YOU PLEASE TELL ME WHICH DATA SET YOU ARE USING. > > On Tue, Oct 9, 2018 at 3:50 PM 'Michael Hunger' via Neo4j < > ne...@googlegroups.com <javascript:>> wrote: > >> Yes, I would only create every word node once. And then link the sentence >> structures. >> In general, just finding all the word nodes is probably not your end-goal >> or? >> >> Best ask here Community Site & Forum <https://community.neo4j.com> in >> the Modeling and Cypher categories. >> >> >> On Tue, Oct 9, 2018 at 11:00 PM John Carlo <johncar...@gmail.com >> <javascript:>> wrote: >> >>> Hello all, >>> >>> I've been using Neo4j for some weeks and I think it's awesome. >>> >>> I'm building an NLP application, and basically, I'm using Neo4j for >>> storing the dependency graph generated by a semantic parser, something like >>> this: >>> >>> https://explosion.ai/demos/displacy?text=Hi%20dear%2C%20what%20is%20your%20name%3F&model=en_core_web_sm&cpu=1&cph=0 >>> >>> In the nodes, I store the single words contained in the sentences, and I >>> connect them through relations with a number of different types. >>> >>> For my application, I have the requirement to find all the nodes that >>> contain a given word, so basically I have to search through all the nodes, >>> finding those that contain the input word. Of course, I've already created >>> an index on the word text field. >>> >>> I'm working on a very big dataset (by the way, the CSV importer is a >>> great thing). >>> >>> On my laptop, the following query takes about 20 ms >>> *MATCH (t:token) WHERE t.text="avoid" RETURN t.text* >>> >>> Here are the details of the graph.db: >>> 47.108.544 nodes >>> >>> *45.442.034 relationships* >>> >>> *13.39 GiB db size* >>> *Index created on token.text field* >>> >>> PROFILE MATCH (t:token) WHERE t.text="switch" RETURN t.text >>> ------------------------ >>> NodeIndexSeek >>> 251,679 db hits >>> --------------- >>> Projection >>> 251,678 db hits >>> -------------- >>> ProduceResults >>> 251,678 db hits >>> >>> I wonder if I'm doing something wrong in indexing such amount of nodes. >>> At the moment, I create a new node for each word I encounter in the text, >>> even if the text is the same of other nodes. >>> >>> Should I create a new node only when a new word is encountered, managing >>> the sentence structures through relationships? >>> >>> Could you please help me with a suggestion or best practice to adopt for >>> this specific case? I think that Neo4j is a great piece of software and I'd >>> like to make the most out of it :-) >>> >>> Thank you very much >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "Neo4j" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to neo4j+un...@googlegroups.com <javascript:>. >>> For more options, visit https://groups.google.com/d/optout. >>> >> -- >> You received this message because you are subscribed to the Google Groups >> "Neo4j" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to neo4j+un...@googlegroups.com <javascript:>. >> For more options, visit https://groups.google.com/d/optout. >> > -- You received this message because you are subscribed to the Google Groups "Neo4j" group. To unsubscribe from this group and stop receiving emails from it, send an email to neo4j+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.