Hi Michael.

Yes, indexed the `source_id` properties for all nodes using the exact 
syntax you described. I did it after the fact though, meaning after I had 
migrate data into the graph. I then went through and did MATCH(d) SET 
d.source_id=d.source_id just to be safe.

I'm sure sure what the terminology is for relationships exactly, but mine 
are definitely vectors in that :MENTIONS and :CONTAINS have arrows and only 
go in one direction. For example, a document -[:MENTIONS]-> a country, but 
not the other way around.

On Wednesday, November 12, 2014 8:47:59 PM UTC-5, Michael Hunger wrote:
>
> Hi Eric,
>
> did you do:
>
> create index on :Document(source_id);
>
> Also your relationships are they bi-directional between the same two nodes?
>
> On Wed, Nov 12, 2014 at 11:06 PM, Eric Gade <[email protected] 
> <javascript:>> wrote:
>
>> Hello. I have created what I believe is a not-terribly-complex Neo 
>> database. If you want to cut to the chase, just scroll down to the section 
>> called "*The Problem*"
>>
>> Here is the structure:
>>
>> *Nodes*
>>
>> (:Document) ~75k
>> (:Country) ~300
>> (:Person) ~8k
>>
>> *Relationships*
>>
>> -[:MENTIONS]-> ~300k
>>
>> *System Information*
>>
>> 16 Cores
>> 480gb HD
>> 48GB RAM
>> Ubuntu Server 14.04 LTS
>> Neo4j Version 2.1.5
>>
>> *Config*
>>
>> I've adjusted for the config is the min and max heap size (disabled by 
>> default)
>> Min: 2048
>> Max: 4096
>>
>> I set the max open files to 60000 from the default 1024 for my system 
>> (Linux users know what I'm talking about)
>>
>> I set a max query time of two minutes via the 
>> `org.neo4j.server.webserver.limit.executiontimeout` param, though I only 
>> did this recently because many queries were taking longer than two minutes. 
>> Prior to this, certain queries which I would guess should be fast would 
>> never finish (see below)
>>
>> I have also indexed a parameter on all nodes called `source_id`, which is 
>> the `id` value for these things in the database from which I imported them.
>>
>>
>> *Weird Observatons*
>>
>> Before I altered the max and min heap sizes in the config file, `htop` 
>> was showing me some (alarming??) stats -- VIRT was 17.5GB for the server 
>> process.
>> Now, with the new settings, it's at a much lower 10100M, but I still 
>> don't understand why.
>>
>>
>> *The Problem*
>>
>> Here's a simple query that never returns. I've waited as long as 5 
>> minutes and still nothing:
>> MATCH(d:Document)-[*2]-(something)
>> WHERE d.source_id='SOMEIDHERE'
>> RETURN d,something;
>>
>> Based on some of the queries I've seen other people talk about, with 
>> variable relations in the dozens, and for datasets that have millions of 
>> nodes using laptop hardware, something seems very wrong to me here.
>>
>> I've read all of the articles I could find on configurations and ways to 
>> improve performance. Any ideas?
>>
>>
>>
>>
>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "Neo4j" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to