Hi Michael. Yes, indexed the `source_id` properties for all nodes using the exact syntax you described. I did it after the fact though, meaning after I had migrate data into the graph. I then went through and did MATCH(d) SET d.source_id=d.source_id just to be safe.
I'm sure sure what the terminology is for relationships exactly, but mine are definitely vectors in that :MENTIONS and :CONTAINS have arrows and only go in one direction. For example, a document -[:MENTIONS]-> a country, but not the other way around. On Wednesday, November 12, 2014 8:47:59 PM UTC-5, Michael Hunger wrote: > > Hi Eric, > > did you do: > > create index on :Document(source_id); > > Also your relationships are they bi-directional between the same two nodes? > > On Wed, Nov 12, 2014 at 11:06 PM, Eric Gade <[email protected] > <javascript:>> wrote: > >> Hello. I have created what I believe is a not-terribly-complex Neo >> database. If you want to cut to the chase, just scroll down to the section >> called "*The Problem*" >> >> Here is the structure: >> >> *Nodes* >> >> (:Document) ~75k >> (:Country) ~300 >> (:Person) ~8k >> >> *Relationships* >> >> -[:MENTIONS]-> ~300k >> >> *System Information* >> >> 16 Cores >> 480gb HD >> 48GB RAM >> Ubuntu Server 14.04 LTS >> Neo4j Version 2.1.5 >> >> *Config* >> >> I've adjusted for the config is the min and max heap size (disabled by >> default) >> Min: 2048 >> Max: 4096 >> >> I set the max open files to 60000 from the default 1024 for my system >> (Linux users know what I'm talking about) >> >> I set a max query time of two minutes via the >> `org.neo4j.server.webserver.limit.executiontimeout` param, though I only >> did this recently because many queries were taking longer than two minutes. >> Prior to this, certain queries which I would guess should be fast would >> never finish (see below) >> >> I have also indexed a parameter on all nodes called `source_id`, which is >> the `id` value for these things in the database from which I imported them. >> >> >> *Weird Observatons* >> >> Before I altered the max and min heap sizes in the config file, `htop` >> was showing me some (alarming??) stats -- VIRT was 17.5GB for the server >> process. >> Now, with the new settings, it's at a much lower 10100M, but I still >> don't understand why. >> >> >> *The Problem* >> >> Here's a simple query that never returns. I've waited as long as 5 >> minutes and still nothing: >> MATCH(d:Document)-[*2]-(something) >> WHERE d.source_id='SOMEIDHERE' >> RETURN d,something; >> >> Based on some of the queries I've seen other people talk about, with >> variable relations in the dozens, and for datasets that have millions of >> nodes using laptop hardware, something seems very wrong to me here. >> >> I've read all of the articles I could find on configurations and ways to >> improve performance. Any ideas? >> >> >> >> >> -- >> You received this message because you are subscribed to the Google Groups >> "Neo4j" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected] <javascript:>. >> For more options, visit https://groups.google.com/d/optout. >> > > -- You received this message because you are subscribed to the Google Groups "Neo4j" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
