You could start with the 20 newsgroups dataset
http://qwone.com/~jason/20Newsgroups/

Il giorno mercoledì 10 ottobre 2018 17:42:37 UTC+2, Sakshi Srivastva ha 
scritto:
>
> Sir, i am in search of a data set in which i can find hidden facts like 
> panama leak ,please suggest me similar big data set .
>
> On Wed, Oct 10, 2018 at 7:34 PM John Carlo <johncar...@gmail.com 
> <javascript:>> wrote:
>
>> Hello Michael, 
>>
>> thank your for your reply. 
>>
>> I've re-implemented the db structure using unique words/nodes, now the 
>> number of nodes dropped from 47.108.544 to 1.934.049
>>
>> I still have a huge number of relationships, 45.442.034 that now point to 
>> the unique nodes, and the query are slow.
>>
>> My end goal is to find specific patterns in sentence structures, like the 
>> following example
>>
>> (John)-[ACTION ]->(eat)-[SUBJECT]->(apple)
>>
>> Any suggestion will be appreciated
>>
>> thank you very much
>>
>> Il giorno mercoledì 10 ottobre 2018 00:50:22 UTC+2, Michael Hunger ha 
>> scritto:
>>>
>>> Yes, I would only create every word node once. And then link the 
>>> sentence structures.
>>> In general, just finding all the word nodes is probably not your 
>>> end-goal or?
>>>
>>> Best ask here Community Site & Forum <https://community.neo4j.com> in 
>>> the Modeling and Cypher categories.
>>>
>>>
>>> On Tue, Oct 9, 2018 at 11:00 PM John Carlo <johncar...@gmail.com> wrote:
>>>
>>>> Hello all, 
>>>>
>>>> I've been using Neo4j for some weeks and I think it's awesome. 
>>>>
>>>> I'm building an NLP application, and basically, I'm using Neo4j for 
>>>> storing the dependency graph generated by a semantic parser, something 
>>>> like 
>>>> this:
>>>>
>>>> https://explosion.ai/demos/displacy?text=Hi%20dear%2C%20what%20is%20your%20name%3F&model=en_core_web_sm&cpu=1&cph=0
>>>>
>>>> In the nodes, I store the single words contained in the sentences, and 
>>>> I connect them through relations with a number of different types.
>>>>
>>>> For my application, I have the requirement to find all the nodes that 
>>>> contain a given word, so basically I have to search through all the nodes, 
>>>> finding those that contain the input word.  Of course, I've already 
>>>> created 
>>>> an index on the word text field.
>>>>
>>>> I'm working on a very big dataset (by the way, the CSV importer is a 
>>>> great thing). 
>>>>
>>>> On my laptop, the following query takes about 20 ms
>>>> *MATCH (t:token) WHERE t.text="avoid" RETURN t.text*
>>>>
>>>> Here are the details of the graph.db:
>>>> 47.108.544 nodes
>>>>
>>>> *45.442.034 relationships*
>>>>
>>>> *13.39 GiB db size*
>>>> *Index created on token.text field*
>>>>
>>>> PROFILE MATCH (t:token) WHERE t.text="switch" RETURN t.text
>>>> ------------------------
>>>> NodeIndexSeek
>>>> 251,679 db hits
>>>> ---------------
>>>> Projection
>>>> 251,678 db hits
>>>> --------------
>>>> ProduceResults
>>>> 251,678 db hits
>>>>
>>>> I wonder if I'm doing something wrong in indexing such amount of nodes. 
>>>> At the moment, I create a new node for each word I encounter in the text, 
>>>> even if the text is the same of other nodes.
>>>>
>>>> Should I create a new node only when a new word is encountered, 
>>>> managing the sentence structures through relationships?
>>>>
>>>> Could you please help me with a suggestion or best practice to adopt 
>>>> for this specific case? I think that Neo4j is a great piece of software 
>>>> and 
>>>> I'd like to make the most out of it :-)
>>>>
>>>> Thank you very much 
>>>>
>>>> -- 
>>>> You received this message because you are subscribed to the Google 
>>>> Groups "Neo4j" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>> an email to neo4j+un...@googlegroups.com.
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "Neo4j" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to neo4j+un...@googlegroups.com <javascript:>.
>> For more options, visit https://groups.google.com/d/optout.
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to neo4j+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to