Sharding a graph is something that is often reasonably easy to do for specific domains (based on knowledge of the domain), but very hard to do generically for all domains. For this reason, while it has been considered for a long time, neo4j has not supported built-in partitioning of the graph.
Unsurprisingly this problem is analogous to the partitioning performance issue with RDBMS. In an RDBMS, it might be simple to pick a consistent hash for any specific table, and partition based on that. However, it becomes hard to pick consistent hashes for multiple tables such that all likely join operations do not cross partitions too much. This means that for simple, less connected data (less joins), RDBMS partition easily, but for highly connected data, the performance tanks (joins cross partitions). If you have a nicely designed, domain specific graph in Neo4j, you should consider partitioning it in the application layer. Since the partitioning is domain specific, this is the natural place to do it anyway (even with rdbms for highly connected models). I can also comment that one often finds that 'big data' models under consideration for partitioning are often still small enough to fit inside one neo4j instance. Everyone has a different idea about what is 'big data'. Make sure that your decision to partition is based on a real need to split the data, not on the perception that you might need this. As Michael implied, perhaps the built-in HA mode is sufficient for your needs. On Wed, Jun 4, 2014 at 7:22 AM, Michael Hunger < [email protected]> wrote: > Each instance holds the _full_ graph. That way you achieve zero copy > failover and high performance traversals which have never to cross the > network. > > Michael > > > On Wed, Jun 4, 2014 at 1:06 AM, Bernardo Hermont <[email protected]> > wrote: > >> Hi Stefan, >> >> Thank you for your e-mail. >> So there is not a way of each cluster member storing only part of the >> graph, I mean, to have more control over each part is stored on each >> cluster node? >> >> I ask this just to see how exactly Neo4j fits into my requirements right >> now. >> >> Thank you again, >> >> Bernardo >> >> >> On Monday, June 2, 2014 3:11:42 AM UTC-4, Stefan Armbruster wrote: >> >>> Hi, >>> >>> Neo4j's clustering model is a master-slave replication. Each cluster >>> member has a copy of the full graph enabling doing read operations >>> without cluster intercommunication - and therefore scales reads almost >>> linearly. >>> >>> Cheers, >>> Stefan >>> >>> 2014-06-02 2:35 GMT+02:00 Bernardo Hermont <[email protected]>: >>> > Hi all, >>> > >>> > I have the following questions about Neo4j >>> > >>> > Is it possible to inform which slave node I want to store data that it >>> is >>> > inserted using PUT REST interface? >>> > Or the data is automatically sharded along the slave servers that are >>> part >>> > of the cluster? >>> > >>> > Is there any way to do this in a clustered configuration? >>> > >>> > Regards, >>> > >>> > Bernardo >>> > >>> > -- >>> > You received this message because you are subscribed to the Google >>> Groups >>> > "Neo4j" group. >>> > To unsubscribe from this group and stop receiving emails from it, send >>> an >>> > email to [email protected]. >>> > For more options, visit https://groups.google.com/d/optout. >>> >> -- >> You received this message because you are subscribed to the Google Groups >> "Neo4j" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> For more options, visit https://groups.google.com/d/optout. >> > > -- > You received this message because you are subscribed to the Google Groups > "Neo4j" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "Neo4j" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
