[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server

Yonik Seeley (JIRA) Wed, 28 Nov 2012 07:41:05 -0800

    [ 
https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13505564#comment-13505564
 ]


Yonik Seeley commented on SOLR-4114:
------------------------------------

bq. I would expect "replication-factor" to say something about how many times 
the data is REPLICATED.

I would too, but we would still disagree on what that meant since I would 
interpret the "number of times the data is replicated" to mean the total number 
of copies that exist after a write operation to the cluster.  That seems to be 
the much more common interpretation in this context since there is no 
"original"... everyone has stored/indexed a copy.

$ echo hello > file1.txt
$ cp file1.txt file2.txt

How many copies of the file are there? If you look at the state (and not the 
mechanism by which you arrived there) most would say there are 2 copies.
In one interpretation, there is only one "copy", but that's too literal and 
assignes some special category to the original.


http://hadoop.apache.org/docs/r0.20.2/hdfs_design.html
"The number of copies of a file is called the replication factor of that file."

http://www.datastax.com/docs/1.0/cluster_architecture/replication
"The total number of replicas across the cluster is referred to as the 
replication factor. A replication factor of 1 means that there is only one copy 
of each row on one node."

Oracle NoSQL store:
http://docs.oracle.com/cd/NOSQL/html/AdminGuide/introduction.html#replicationfactor
http://docs.oracle.com/cd/NOSQL/html/AdminGuide/store-config.html
"A Replication Factor of 3 gives you shards with one master plus two replicas."

Riak:
http://wiki.basho.com/What-is-Riak%3F.html
"An n value of 3 (default) means that each object is replicated 3 times. When 
an object’s key is mapped onto a given partition, Riak won’t stop there – it 
automatically replicates the data onto the next two partitions as well."

Splunk:
http://docs.splunk.com/Documentation/Splunk/latest/Indexer/Thereplicationfactor
"The number of data/bucket copies is called the cluster's replication factor."
"The cluster can tolerate a failure of (replication factor - 1) peer nodes. So, 
for example, to ensure that your system can tolerate a failure of two peers, 
you must configure a replication factor of 3, which means that the cluster 
stores three identical copies of each bucket on separate nodes. With a 
replication factor of 3, you can be certain that all your data will be 
available if no more than two peer nodes in the cluster fail. With two nodes 
down, you still have one complete copy of your data available on the remaining 
peer(s)."

It's clear that "3 copies" means 3 total instances of the same data, not 4 (an 
"original" plus 3 more copies of it.)

                
> Collection API: Allow multiple shards from one collection on the same Solr 
> server
> ---------------------------------------------------------------------------------
>
>                 Key: SOLR-4114
>                 URL: https://issues.apache.org/jira/browse/SOLR-4114
>             Project: Solr
>          Issue Type: New Feature
>          Components: multicore, SolrCloud
>    Affects Versions: 4.0
>         Environment: Solr 4.0.0 release
>            Reporter: Per Steffensen
>            Assignee: Per Steffensen
>              Labels: collection-api, multicore, shard, shard-allocation
>         Attachments: SOLR-4114.patch, SOLR-4114.patch
>
>
> We should support running multiple shards from one collection on the same 
> Solr server - the run a collection with 8 shards on a 4 Solr server cluster 
> (each Solr server running 2 shards).
> Performance tests at our side has shown that this is a good idea, and it is 
> also a good idea for easy elasticity later on - it is much easier to move an 
> entire existing shards from one Solr server to another one that just joined 
> the cluter than it is to split an exsiting shard among the Solr that used to 
> run it and the new Solr.
> See dev mailing list discussion "Multiple shards for one collection on the 
> same Solr server"

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server

Reply via email to