You could start with the 20 newsgroups dataset http://qwone.com/~jason/20Newsgroups/
Il giorno mercoledì 10 ottobre 2018 17:42:37 UTC+2, Sakshi Srivastva ha scritto: > > Sir, i am in search of a data set in which i can find hidden facts like > panama leak ,please suggest me similar big data set . > > On Wed, Oct 10, 2018 at 7:34 PM John Carlo <johncar...@gmail.com > <javascript:>> wrote: > >> Hello Michael, >> >> thank your for your reply. >> >> I've re-implemented the db structure using unique words/nodes, now the >> number of nodes dropped from 47.108.544 to 1.934.049 >> >> I still have a huge number of relationships, 45.442.034 that now point to >> the unique nodes, and the query are slow. >> >> My end goal is to find specific patterns in sentence structures, like the >> following example >> >> (John)-[ACTION ]->(eat)-[SUBJECT]->(apple) >> >> Any suggestion will be appreciated >> >> thank you very much >> >> Il giorno mercoledì 10 ottobre 2018 00:50:22 UTC+2, Michael Hunger ha >> scritto: >>> >>> Yes, I would only create every word node once. And then link the >>> sentence structures. >>> In general, just finding all the word nodes is probably not your >>> end-goal or? >>> >>> Best ask here Community Site & Forum <https://community.neo4j.com> in >>> the Modeling and Cypher categories. >>> >>> >>> On Tue, Oct 9, 2018 at 11:00 PM John Carlo <johncar...@gmail.com> wrote: >>> >>>> Hello all, >>>> >>>> I've been using Neo4j for some weeks and I think it's awesome. >>>> >>>> I'm building an NLP application, and basically, I'm using Neo4j for >>>> storing the dependency graph generated by a semantic parser, something >>>> like >>>> this: >>>> >>>> https://explosion.ai/demos/displacy?text=Hi%20dear%2C%20what%20is%20your%20name%3F&model=en_core_web_sm&cpu=1&cph=0 >>>> >>>> In the nodes, I store the single words contained in the sentences, and >>>> I connect them through relations with a number of different types. >>>> >>>> For my application, I have the requirement to find all the nodes that >>>> contain a given word, so basically I have to search through all the nodes, >>>> finding those that contain the input word. Of course, I've already >>>> created >>>> an index on the word text field. >>>> >>>> I'm working on a very big dataset (by the way, the CSV importer is a >>>> great thing). >>>> >>>> On my laptop, the following query takes about 20 ms >>>> *MATCH (t:token) WHERE t.text="avoid" RETURN t.text* >>>> >>>> Here are the details of the graph.db: >>>> 47.108.544 nodes >>>> >>>> *45.442.034 relationships* >>>> >>>> *13.39 GiB db size* >>>> *Index created on token.text field* >>>> >>>> PROFILE MATCH (t:token) WHERE t.text="switch" RETURN t.text >>>> ------------------------ >>>> NodeIndexSeek >>>> 251,679 db hits >>>> --------------- >>>> Projection >>>> 251,678 db hits >>>> -------------- >>>> ProduceResults >>>> 251,678 db hits >>>> >>>> I wonder if I'm doing something wrong in indexing such amount of nodes. >>>> At the moment, I create a new node for each word I encounter in the text, >>>> even if the text is the same of other nodes. >>>> >>>> Should I create a new node only when a new word is encountered, >>>> managing the sentence structures through relationships? >>>> >>>> Could you please help me with a suggestion or best practice to adopt >>>> for this specific case? I think that Neo4j is a great piece of software >>>> and >>>> I'd like to make the most out of it :-) >>>> >>>> Thank you very much >>>> >>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "Neo4j" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to neo4j+un...@googlegroups.com. >>>> For more options, visit https://groups.google.com/d/optout. >>>> >>> -- >> You received this message because you are subscribed to the Google Groups >> "Neo4j" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to neo4j+un...@googlegroups.com <javascript:>. >> For more options, visit https://groups.google.com/d/optout. >> > -- You received this message because you are subscribed to the Google Groups "Neo4j" group. To unsubscribe from this group and stop receiving emails from it, send an email to neo4j+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.