Tobias,

Thanks very much for the response.  I'll follow up with you privately,
and then I can summarize here.

--Jamie

On Thu, Apr 14, 2011 at 6:24 PM, Tobias Ivarsson
<tobias.ivars...@neotechnology.com> wrote:
> Hi Jamie,
>
> Very interesting use case you have there.
>
> If you could just provide a few more bits of information about your data,
> I'd be able to provide a better analysis.
>
> * Is the distribution of relationships uniform across the nodes. If not, how
> much does it vary?
>
> * What kind of operation you want to do in the 50ms latency that you've
> specified. You just mentioned "relatively shallow traversals", approximately
> how deep is that? The key figure to get at is how many relationships you
> need to traverse, and that is a function of the depth of the traversal and
> the distribution of the relationships.
>
> * Is the 50ms latency for each single request? What is the estimated number
> of concurrent requests?
>
> * What kind of properties are those 32 byte values? ASCII strings of length
> 32?
>
> * The bulk loading phase you describe, is it a one time import of initial
> data, or a regularly recurring thing?
>
> * In the bulk load phase, what form does the data have? Would it be possible
> to have that data in a format where each node is uniquely identified by a
> number (in both the node data listing, and the edge-list).
>
>
> Finally, do you have test data for these sizes? If possible I'd love to work
> with you on this, to get an good use case to work on for improving the large
> data story in Neo4j.
>
> Cheers,
> Tobias
>
> On Thu, Apr 14, 2011 at 3:12 PM, Jamie Stephens <j...@morphism.com> wrote:
>
>> Folks,
>>
>> I've got an application that has (will have) about 2 billion vertexes
>> and maybe 8 billion edges (?).  Maybe an avg of 4 properties per
>> vertex -- with maybe an avg of 32 bytes/value.  So I guess that's 16
>> billion primitives.  Let's round to 20 billion.  My edges estimate is
>> a relatively uninformed guess.  Just starting to dig into the data.
>>
>> Traversals will be relatively shallow.  Concurrent access.  Throughput
>> is more important than latency.  But latency should be better than
>> maybe 50ms 99% of the time (allowing for some cache warming and some
>> GC).  I don't know much yet about locality.  I'm not sure yet how
>> sensitive the app will be to long GCs.
>>
>> We will need to do a big batch load, and writes will need to be fast
>> in that phase.  After that, we'll see more reads that writes.  So I
>> imagine a config for the batch load and another config for production.
>>
>> I understand cache sharding, application-level partitioning, and so
>> forth.  I'm wondering what I can do on a single machine -- and what
>> that machine should look like.
>>
>> http://docs.neo4j.org/chunked/stable/configuration-jvm.html and
>> http://wiki.neo4j.org/content/Neo4j_Performance_Guide are encouraging.
>>  And having knobs as documented at
>> http://wiki.neo4j.org/content/Configuration_Settings is great.  Nice
>> work!
>>
>> I'm hoping I might be able to get away with 128GB RAM on 12 cores with
>> data striped over a handful of disks (SSDs if required).  We'll
>> probably also need a cluster for both traffic and availability, but
>> that's another topic.
>>
>> Does anybody have experience with a data set like this on a similar
>> machine?  How much RAM and how much disk -- and what kinds and in what
>> configuration?  Latency, throughput, general experience?  Any
>> production deployments?
>>
>> I'd appreciate any guidance or feedback.  I'm happy to summarize later
>> if that'd be helpful.
>>
>> BTW, my testbed uses Clojure with clojure.contrib.server-socket and
>> https://github.com/wagjo/borneo. Very convenient!
>>
>> --Jamie
>> _______________________________________________
>> Neo4j mailing list
>> User@lists.neo4j.org
>> https://lists.neo4j.org/mailman/listinfo/user
>>
>
>
>
> --
> Tobias Ivarsson <tobias.ivars...@neotechnology.com>
> Hacker, Neo Technology
> www.neotechnology.com
> Cellphone: +46 706 534857
> _______________________________________________
> Neo4j mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user
>
_______________________________________________
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user

Reply via email to