Hello all, I've been using Neo4j for some weeks and I think it's awesome.
I'm building an NLP application, and basically, I'm using Neo4j for storing the dependency graph generated by a semantic parser, something like this: https://explosion.ai/demos/displacy?text=Hi%20dear%2C%20what%20is%20your%20name%3F&model=en_core_web_sm&cpu=1&cph=0 In the nodes, I store the single words contained in the sentences, and I connect them through relations with a number of different types. For my application, I have the requirement to find all the nodes that contain a given word, so basically I have to search through all the nodes, finding those that contain the input word. Of course, I've already created an index on the word text field. I'm working on a very big dataset (by the way, the CSV importer is a great thing). On my laptop, the following query takes about 20 ms *MATCH (t:token) WHERE t.text="avoid" RETURN t.text* Here are the details of the graph.db: 47.108.544 nodes *45.442.034 relationships* *13.39 GiB db size* *Index created on token.text field* PROFILE MATCH (t:token) WHERE t.text="switch" RETURN t.text ------------------------ NodeIndexSeek 251,679 db hits --------------- Projection 251,678 db hits -------------- ProduceResults 251,678 db hits I wonder if I'm doing something wrong in indexing such amount of nodes. At the moment, I create a new node for each word I encounter in the text, even if the text is the same of other nodes. Should I create a new node only when a new word is encountered, managing the sentence structures through relationships? Could you please help me with a suggestion or best practice to adopt for this specific case? I think that Neo4j is a great piece of software and I'd like to make the most out of it :-) Thank you very much -- You received this message because you are subscribed to the Google Groups "Neo4j" group. To unsubscribe from this group and stop receiving emails from it, send an email to neo4j+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.