Re: [Neo4j] Very Slow Simple Queries (System Details Inside)

Eric Gade Mon, 17 Nov 2014 11:10:14 -0800

Excuse me, sorry. What I meant to say was that I accidentally left the ID 
restriction out in that last query and that this was a mistake on my part.


Now that I've run the original query for which I created this post in the 
shell, it's returning in a reasonable amount of time (about 1.2 seconds). 
Odd, as there seems to be a crucial difference between this and the web 
console...

On Monday, November 17, 2014 1:47:18 PM UTC-5, Eric Gade wrote:
>
> Ahh yes, I've just noticed this now. That was an error in the query I let 
> run over the weekend. Normally I restrict the document on `source_id` by
>
> On Monday, November 17, 2014 9:30:00 AM UTC-5, Michael Hunger wrote:
>>
>> How many documents do you have and how many conns do they have 
>> min,max,avg?
>>
>> As you do a search across all docs and their mentions and then further 
>> out you have to multiply the number of rels
>>
>> In total that's up to 300k^2 paths you find
>>
>> Von meinem iPhone gesendet
>>
>> Am 17.11.2014 um 15:07 schrieb Eric Gade <[email protected]>:
>>
>> *UPDATE*:
>>
>> So I left this running on my digital ocean server over the weekend and 
>> I've just now checked it. Here's the result:
>>
>> neo4j-sh (?)$ profile MATCH (d:Document)-[*2]-(something) RETURN d, 
>> something;
>> Error occurred in server thread; nested exception is:
>>         java.lang.OutOfMemoryError: Java heap space
>>
>> This seems really odd to me, as I thought I was using a more than 
>> reasonable about of heap space:
>>
>> wrapper.java.initmemory=2048
>> wrapper.java.maxmemory=4096
>>
>> Of course maybe that's not enough, and I have no idea what I'm talking 
>> about. It just seems odd that a query that searches for a path with only 2 
>> degrees of separation would be this much of a hassle.
>>
>> On Friday, November 14, 2014 4:39:26 PM UTC-5, Eric Gade wrote:
>>>
>>> Hey Michael,
>>>
>>> Because the query never actually finishes, I'm not sure I'm getting the 
>>> results you want.
>>>
>>> For vanilla Cypher:
>>>
>>> ==> GuardTimeoutException: timeout occurred (overtime=1)
>>>
>>> For the experimental profile:
>>>
>>> ==> GuardTimeoutException: timeout occurred (overtime=1075)
>>>
>>>
>>> BTW, if I remove the timeout limit, the query will not return...or at 
>>> least not in some reasonable amount of time that I've been able to measure. 
>>> Go ahead and let me know what you think. I'm going to connect to my remote 
>>> server and let this command run for a while and see what happens.
>>>
>>>
>>> On Thursday, November 13, 2014 11:25:30 AM UTC-5, Michael Hunger wrote:
>>>>
>>>> Would you be able to run neo4j-shell (or the old webui 
>>>> http://localhost:7474/webadmin -> console) and prefix your query with 
>>>> the profile keyword and send the output?
>>>>
>>>> also prefix it with profile cypher 2.1.experimental
>>>> and do the same.
>>>>
>>>> Thanks  a lot,
>>>>
>>>> Michael
>>>>
>>>> On Thu, Nov 13, 2014 at 3:18 PM, Eric Gade <[email protected]> wrote:
>>>>
>>>>> Hi Michael.
>>>>>
>>>>> Yes, indexed the `source_id` properties for all nodes using the exact 
>>>>> syntax you described. I did it after the fact though, meaning after I had 
>>>>> migrate data into the graph. I then went through and did MATCH(d) SET 
>>>>> d.source_id=d.source_id just to be safe.
>>>>>
>>>>> I'm sure sure what the terminology is for relationships exactly, but 
>>>>> mine are definitely vectors in that :MENTIONS and :CONTAINS have arrows 
>>>>> and 
>>>>> only go in one direction. For example, a document -[:MENTIONS]-> a 
>>>>> country, 
>>>>> but not the other way around.
>>>>>
>>>>> On Wednesday, November 12, 2014 8:47:59 PM UTC-5, Michael Hunger wrote:
>>>>>>
>>>>>> Hi Eric,
>>>>>>
>>>>>> did you do:
>>>>>>
>>>>>> create index on :Document(source_id);
>>>>>>
>>>>>> Also your relationships are they bi-directional between the same two 
>>>>>> nodes?
>>>>>>
>>>>>> On Wed, Nov 12, 2014 at 11:06 PM, Eric Gade <[email protected]> 
>>>>>> wrote:
>>>>>>
>>>>>>> Hello. I have created what I believe is a not-terribly-complex Neo 
>>>>>>> database. If you want to cut to the chase, just scroll down to the 
>>>>>>> section 
>>>>>>> called "*The Problem*"
>>>>>>>
>>>>>>> Here is the structure:
>>>>>>>
>>>>>>> *Nodes*
>>>>>>>
>>>>>>> (:Document) ~75k
>>>>>>> (:Country) ~300
>>>>>>> (:Person) ~8k
>>>>>>>
>>>>>>> *Relationships*
>>>>>>>
>>>>>>> -[:MENTIONS]-> ~300k
>>>>>>>
>>>>>>> *System Information*
>>>>>>>
>>>>>>> 16 Cores
>>>>>>> 480gb HD
>>>>>>> 48GB RAM
>>>>>>> Ubuntu Server 14.04 LTS
>>>>>>> Neo4j Version 2.1.5
>>>>>>>
>>>>>>> *Config*
>>>>>>>
>>>>>>> I've adjusted for the config is the min and max heap size (disabled 
>>>>>>> by default)
>>>>>>> Min: 2048
>>>>>>> Max: 4096
>>>>>>>
>>>>>>> I set the max open files to 60000 from the default 1024 for my 
>>>>>>> system (Linux users know what I'm talking about)
>>>>>>>
>>>>>>> I set a max query time of two minutes via the 
>>>>>>> `org.neo4j.server.webserver.limit.executiontimeout` param, though I 
>>>>>>> only did this recently because many queries were taking longer than two 
>>>>>>> minutes. Prior to this, certain queries which I would guess should be 
>>>>>>> fast 
>>>>>>> would never finish (see below)
>>>>>>>
>>>>>>> I have also indexed a parameter on all nodes called `source_id`, 
>>>>>>> which is the `id` value for these things in the database from which I 
>>>>>>> imported them.
>>>>>>>
>>>>>>>
>>>>>>> *Weird Observatons*
>>>>>>>
>>>>>>> Before I altered the max and min heap sizes in the config file, 
>>>>>>> `htop` was showing me some (alarming??) stats -- VIRT was 17.5GB for 
>>>>>>> the 
>>>>>>> server process.
>>>>>>> Now, with the new settings, it's at a much lower 10100M, but I still 
>>>>>>> don't understand why.
>>>>>>>
>>>>>>>
>>>>>>> *The Problem*
>>>>>>>
>>>>>>> Here's a simple query that never returns. I've waited as long as 5 
>>>>>>> minutes and still nothing:
>>>>>>> MATCH(d:Document)-[*2]-(something)
>>>>>>> WHERE d.source_id='SOMEIDHERE'
>>>>>>> RETURN d,something;
>>>>>>>
>>>>>>> Based on some of the queries I've seen other people talk about, with 
>>>>>>> variable relations in the dozens, and for datasets that have millions 
>>>>>>> of 
>>>>>>> nodes using laptop hardware, something seems very wrong to me here.
>>>>>>>
>>>>>>> I've read all of the articles I could find on configurations and 
>>>>>>> ways to improve performance. Any ideas?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>  -- 
>>>>>>> You received this message because you are subscribed to the Google 
>>>>>>> Groups "Neo4j" group.
>>>>>>> To unsubscribe from this group and stop receiving emails from it, 
>>>>>>> send an email to [email protected].
>>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>>
>>>>>>
>>>>>>  -- 
>>>>> You received this message because you are subscribed to the Google 
>>>>> Groups "Neo4j" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>>> an email to [email protected].
>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>
>>>>
>>>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "Neo4j" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected].
>> For more options, visit https://groups.google.com/d/optout.
>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: [Neo4j] Very Slow Simple Queries (System Details Inside)

Reply via email to