Tobias, Thanks very much for the response. I'll follow up with you privately, and then I can summarize here.
--Jamie On Thu, Apr 14, 2011 at 6:24 PM, Tobias Ivarsson <tobias.ivars...@neotechnology.com> wrote: > Hi Jamie, > > Very interesting use case you have there. > > If you could just provide a few more bits of information about your data, > I'd be able to provide a better analysis. > > * Is the distribution of relationships uniform across the nodes. If not, how > much does it vary? > > * What kind of operation you want to do in the 50ms latency that you've > specified. You just mentioned "relatively shallow traversals", approximately > how deep is that? The key figure to get at is how many relationships you > need to traverse, and that is a function of the depth of the traversal and > the distribution of the relationships. > > * Is the 50ms latency for each single request? What is the estimated number > of concurrent requests? > > * What kind of properties are those 32 byte values? ASCII strings of length > 32? > > * The bulk loading phase you describe, is it a one time import of initial > data, or a regularly recurring thing? > > * In the bulk load phase, what form does the data have? Would it be possible > to have that data in a format where each node is uniquely identified by a > number (in both the node data listing, and the edge-list). > > > Finally, do you have test data for these sizes? If possible I'd love to work > with you on this, to get an good use case to work on for improving the large > data story in Neo4j. > > Cheers, > Tobias > > On Thu, Apr 14, 2011 at 3:12 PM, Jamie Stephens <j...@morphism.com> wrote: > >> Folks, >> >> I've got an application that has (will have) about 2 billion vertexes >> and maybe 8 billion edges (?). Maybe an avg of 4 properties per >> vertex -- with maybe an avg of 32 bytes/value. So I guess that's 16 >> billion primitives. Let's round to 20 billion. My edges estimate is >> a relatively uninformed guess. Just starting to dig into the data. >> >> Traversals will be relatively shallow. Concurrent access. Throughput >> is more important than latency. But latency should be better than >> maybe 50ms 99% of the time (allowing for some cache warming and some >> GC). I don't know much yet about locality. I'm not sure yet how >> sensitive the app will be to long GCs. >> >> We will need to do a big batch load, and writes will need to be fast >> in that phase. After that, we'll see more reads that writes. So I >> imagine a config for the batch load and another config for production. >> >> I understand cache sharding, application-level partitioning, and so >> forth. I'm wondering what I can do on a single machine -- and what >> that machine should look like. >> >> http://docs.neo4j.org/chunked/stable/configuration-jvm.html and >> http://wiki.neo4j.org/content/Neo4j_Performance_Guide are encouraging. >> And having knobs as documented at >> http://wiki.neo4j.org/content/Configuration_Settings is great. Nice >> work! >> >> I'm hoping I might be able to get away with 128GB RAM on 12 cores with >> data striped over a handful of disks (SSDs if required). We'll >> probably also need a cluster for both traffic and availability, but >> that's another topic. >> >> Does anybody have experience with a data set like this on a similar >> machine? How much RAM and how much disk -- and what kinds and in what >> configuration? Latency, throughput, general experience? Any >> production deployments? >> >> I'd appreciate any guidance or feedback. I'm happy to summarize later >> if that'd be helpful. >> >> BTW, my testbed uses Clojure with clojure.contrib.server-socket and >> https://github.com/wagjo/borneo. Very convenient! >> >> --Jamie >> _______________________________________________ >> Neo4j mailing list >> User@lists.neo4j.org >> https://lists.neo4j.org/mailman/listinfo/user >> > > > > -- > Tobias Ivarsson <tobias.ivars...@neotechnology.com> > Hacker, Neo Technology > www.neotechnology.com > Cellphone: +46 706 534857 > _______________________________________________ > Neo4j mailing list > User@lists.neo4j.org > https://lists.neo4j.org/mailman/listinfo/user > _______________________________________________ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user