dsmiley commented on code in PR #4246: URL: https://github.com/apache/solr/pull/4246#discussion_r3006238656
########## solr/solr-ref-guide/modules/getting-started/pages/cluster-types.adoc: ########## @@ -0,0 +1,158 @@ += Solr Cluster Types +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +A Solr cluster is a group of servers that each run one or more Solr _nodes_. + +There are two general modes of operating a cluster of Solr nodes. +One mode provides central coordination of the Solr nodes (<<SolrCloud Mode>>), while the other allows you to operate a cluster without this central coordination (<<User-Managed Mode>>). + +TIP: "User Managed" and "Single Node" are sometimes referred to as "Standalone", especially in source code. + +Both modes share general concepts, but ultimately differ in how those concepts are reflected in functionality and features. + +First let's cover a few general concepts and then outline the differences between the two modes. + +== Cluster Concepts + +=== Servers and Nodes + +A _server_ is the hardware or virtual machine that hosts Solr software. +A _node_ is an instance of a running Solr process that services search and indexing requests. +Large servers may run multiple Solr nodes, though typically one node per server is most common. + +=== Shards + +In both cluster modes, a logical collection of documents can be divided across nodes as _shards_. Review Comment: ```suggestion In both cluster modes, a logical collection of documents can be divided into logical _shards_. ``` ########## solr/solr-ref-guide/modules/getting-started/pages/cluster-types.adoc: ########## @@ -0,0 +1,158 @@ += Solr Cluster Types +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +A Solr cluster is a group of servers that each run one or more Solr _nodes_. + +There are two general modes of operating a cluster of Solr nodes. +One mode provides central coordination of the Solr nodes (<<SolrCloud Mode>>), while the other allows you to operate a cluster without this central coordination (<<User-Managed Mode>>). + +TIP: "User Managed" and "Single Node" are sometimes referred to as "Standalone", especially in source code. + +Both modes share general concepts, but ultimately differ in how those concepts are reflected in functionality and features. + +First let's cover a few general concepts and then outline the differences between the two modes. + +== Cluster Concepts + +=== Servers and Nodes + +A _server_ is the hardware or virtual machine that hosts Solr software. +A _node_ is an instance of a running Solr process that services search and indexing requests. +Large servers may run multiple Solr nodes, though typically one node per server is most common. + +=== Shards + +In both cluster modes, a logical collection of documents can be divided across nodes as _shards_. +Each shard represents a logical slice of the overall collection and contains a subset of the documents. + +The number of shards determines the theoretical limit to the number of documents that can be stored. +It also dictates the amount of parallelization possible for an individual search request. + +=== Replicas + +A shard is a logical concept—a slice of your collection. +A _replica_ is the physical manifestation of that logical shard. +It is the actual running instance that holds and serves the documents belonging to that shard. Review Comment: ```suggestion It's comprised of a Lucene "index" on disk, and likely an "UpdateLog" accompanying it. A Solr node runs a `SolrCore` for each of its replicas that are solely responsible for operating that index including handling requests to it for indexing & searching. ``` ########## solr/solr-ref-guide/modules/getting-started/pages/cluster-types.adoc: ########## @@ -0,0 +1,158 @@ += Solr Cluster Types +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +A Solr cluster is a group of servers that each run one or more Solr _nodes_. + +There are two general modes of operating a cluster of Solr nodes. +One mode provides central coordination of the Solr nodes (<<SolrCloud Mode>>), while the other allows you to operate a cluster without this central coordination (<<User-Managed Mode>>). + +TIP: "User Managed" and "Single Node" are sometimes referred to as "Standalone", especially in source code. + +Both modes share general concepts, but ultimately differ in how those concepts are reflected in functionality and features. + +First let's cover a few general concepts and then outline the differences between the two modes. + +== Cluster Concepts + +=== Servers and Nodes + +A _server_ is the hardware or virtual machine that hosts Solr software. +A _node_ is an instance of a running Solr process that services search and indexing requests. +Large servers may run multiple Solr nodes, though typically one node per server is most common. + +=== Shards + +In both cluster modes, a logical collection of documents can be divided across nodes as _shards_. +Each shard represents a logical slice of the overall collection and contains a subset of the documents. + +The number of shards determines the theoretical limit to the number of documents that can be stored. +It also dictates the amount of parallelization possible for an individual search request. + +=== Replicas + +A shard is a logical concept—a slice of your collection. +A _replica_ is the physical manifestation of that logical shard. +It is the actual running instance that holds and serves the documents belonging to that shard. + +A shard must have at least one replica to exist physically. +If you have one shard with one physical copy, you have one replica. +If you add redundancy by creating additional copies of that shard, you have multiple replicas—each is equally a replica, including the first one. + +IMPORTANT: There is no "original shard" separate from its replicas. +The replicas ARE how the shard exists. +This is why we say "a shard with 2 replicas" has 2 total physical copies, not an original plus 2 additional copies. + +All replicas of the same shard contain the same subset of documents and share the same configuration. + +The number of replicas determines the level of fault tolerance the cluster has in the event of a node failure. +It also dictates the theoretical limit on the number of concurrent search requests that can be processed under heavy load. + +=== Leaders and Followers + +Among the replicas for a given shard, one replica is designated as the _leader_. +The leader serves as the source-of-truth for its shard. +When document updates are made, they are first processed by the leader replica and then propagated to the other replicas (the exact mechanism varies by cluster mode). + +The replicas which are not leaders are called _followers_. + +=== Cores + +In Solr's implementation, each replica is represented as a _core_. +The term "core" is primarily an internal implementation detail—when you create a replica, Solr creates a core to represent it. +Multiple cores can be hosted on any one node. Review Comment: ```suggestion Multiple cores can be hosted on any one node, although it doesn't make sense to do this for the same shard. ``` ########## solr/solr-ref-guide/modules/getting-started/pages/cluster-types.adoc: ########## @@ -0,0 +1,158 @@ += Solr Cluster Types +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +A Solr cluster is a group of servers that each run one or more Solr _nodes_. + +There are two general modes of operating a cluster of Solr nodes. +One mode provides central coordination of the Solr nodes (<<SolrCloud Mode>>), while the other allows you to operate a cluster without this central coordination (<<User-Managed Mode>>). + +TIP: "User Managed" and "Single Node" are sometimes referred to as "Standalone", especially in source code. + +Both modes share general concepts, but ultimately differ in how those concepts are reflected in functionality and features. + +First let's cover a few general concepts and then outline the differences between the two modes. + +== Cluster Concepts + +=== Servers and Nodes + +A _server_ is the hardware or virtual machine that hosts Solr software. +A _node_ is an instance of a running Solr process that services search and indexing requests. +Large servers may run multiple Solr nodes, though typically one node per server is most common. + +=== Shards + +In both cluster modes, a logical collection of documents can be divided across nodes as _shards_. +Each shard represents a logical slice of the overall collection and contains a subset of the documents. + +The number of shards determines the theoretical limit to the number of documents that can be stored. +It also dictates the amount of parallelization possible for an individual search request. + +=== Replicas + +A shard is a logical concept—a slice of your collection. +A _replica_ is the physical manifestation of that logical shard. +It is the actual running instance that holds and serves the documents belonging to that shard. + +A shard must have at least one replica to exist physically. +If you have one shard with one physical copy, you have one replica. +If you add redundancy by creating additional copies of that shard, you have multiple replicas—each is equally a replica, including the first one. + +IMPORTANT: There is no "original shard" separate from its replicas. +The replicas ARE how the shard exists. +This is why we say "a shard with 2 replicas" has 2 total physical copies, not an original plus 2 additional copies. + +All replicas of the same shard contain the same subset of documents and share the same configuration. Review Comment: ```suggestion All replicas of the same shard contain the same subset of documents. All replicas of the same collection use the same configuration / Configset. ``` ########## solr/solr-ref-guide/modules/getting-started/pages/cluster-types.adoc: ########## @@ -0,0 +1,158 @@ += Solr Cluster Types +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +A Solr cluster is a group of servers that each run one or more Solr _nodes_. + +There are two general modes of operating a cluster of Solr nodes. +One mode provides central coordination of the Solr nodes (<<SolrCloud Mode>>), while the other allows you to operate a cluster without this central coordination (<<User-Managed Mode>>). + +TIP: "User Managed" and "Single Node" are sometimes referred to as "Standalone", especially in source code. + +Both modes share general concepts, but ultimately differ in how those concepts are reflected in functionality and features. + +First let's cover a few general concepts and then outline the differences between the two modes. + +== Cluster Concepts + +=== Servers and Nodes + +A _server_ is the hardware or virtual machine that hosts Solr software. +A _node_ is an instance of a running Solr process that services search and indexing requests. +Large servers may run multiple Solr nodes, though typically one node per server is most common. + +=== Shards + +In both cluster modes, a logical collection of documents can be divided across nodes as _shards_. +Each shard represents a logical slice of the overall collection and contains a subset of the documents. + +The number of shards determines the theoretical limit to the number of documents that can be stored. +It also dictates the amount of parallelization possible for an individual search request. + +=== Replicas + +A shard is a logical concept—a slice of your collection. +A _replica_ is the physical manifestation of that logical shard. +It is the actual running instance that holds and serves the documents belonging to that shard. + +A shard must have at least one replica to exist physically. +If you have one shard with one physical copy, you have one replica. +If you add redundancy by creating additional copies of that shard, you have multiple replicas—each is equally a replica, including the first one. + +IMPORTANT: There is no "original shard" separate from its replicas. +The replicas ARE how the shard exists. +This is why we say "a shard with 2 replicas" has 2 total physical copies, not an original plus 2 additional copies. + +All replicas of the same shard contain the same subset of documents and share the same configuration. + +The number of replicas determines the level of fault tolerance the cluster has in the event of a node failure. +It also dictates the theoretical limit on the number of concurrent search requests that can be processed under heavy load. + +=== Leaders and Followers + +Among the replicas for a given shard, one replica is designated as the _leader_. +The leader serves as the source-of-truth for its shard. +When document updates are made, they are first processed by the leader replica and then propagated to the other replicas (the exact mechanism varies by cluster mode). + +The replicas which are not leaders are called _followers_. + +=== Cores + +In Solr's implementation, each replica is represented as a _core_. +The term "core" is primarily an internal implementation detail—when you create a replica, Solr creates a core to represent it. +Multiple cores can be hosted on any one node. + +NOTE: The term "core" can be confusing because in everyday English it implies something central and singular, but in Solr it actually refers to one of potentially many replicas distributed across the cluster. +In most contexts, thinking of "core" as synonymous with "replica" will help clarify discussions about Solr's architecture. + +=== Collections and Indexes + +A _collection_ is the complete logical set of searchable documents that share a schema and configuration. +In SolrCloud mode (described below), a collection encompasses all the shards and their replicas. + +An _index_ refers to the physical data structures written to disk by Apache Lucene. +Each core (replica) maintains exactly one Lucene index on disk, containing the actual inverted indexes, stored fields, and other data structures that enable search. + +This creates a clear hierarchy from logical concepts to physical storage: + +[source,text] +---- +Collection (logical grouping of all searchable documents) + └─> Shard 1 (logical partition) + │ └─> Replica 1 / Core 1 (physical instance) + │ │ └─> Lucene Index (disk structures) + │ └─> Replica 2 / Core 2 (physical instance) + │ └─> Lucene Index (disk structures) + └─> Shard 2 (logical partition) + └─> Replica 1 / Core 3 (physical instance) + │ └─> Lucene Index (disk structures) + └─> Replica 2 / Core 4 (physical instance) + └─> Lucene Index (disk structures) +---- + +In this example, a collection is divided into 2 shards, each shard has 2 replicas for redundancy, and each replica maintains its own Lucene index on disk. + +== SolrCloud Mode + +SolrCloud mode (also called "SolrCloud") uses Apache ZooKeeper to provide the centralized cluster management that is its main feature. +ZooKeeper tracks each node of the cluster and the state of each core on each node. Review Comment: ```suggestion ZooKeeper holds a list of each live node in the cluster and the state of each collection (and thus shard and replica). ``` ########## solr/solr-ref-guide/modules/configuration-guide/pages/configuration-files.adoc: ########## @@ -96,7 +96,7 @@ The Files screen in the Admin UI lets you browse & view configuration files (suc .The Files Screen image::configuration-files/files-screen.png[Files screen,height=400] -If you are using xref:deployment-guide:cluster-types.adoc#solrcloud-mode[SolrCloud], the files displayed are the configuration files for this collection stored in ZooKeeper. +If you are using xref:getting-started:cluster-types.adoc#solrcloud-mode[SolrCloud], the files displayed are the configuration files for this collection stored in ZooKeeper. In user-managed clusters or single-node installations, all files in the `conf` directory are displayed. Review Comment: lets use "standalone" instead of "single-node" installations. ########## solr/solr-ref-guide/modules/getting-started/pages/cluster-types.adoc: ########## @@ -0,0 +1,158 @@ += Solr Cluster Types +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +A Solr cluster is a group of servers that each run one or more Solr _nodes_. + +There are two general modes of operating a cluster of Solr nodes. +One mode provides central coordination of the Solr nodes (<<SolrCloud Mode>>), while the other allows you to operate a cluster without this central coordination (<<User-Managed Mode>>). + +TIP: "User Managed" and "Single Node" are sometimes referred to as "Standalone", especially in source code. + +Both modes share general concepts, but ultimately differ in how those concepts are reflected in functionality and features. + +First let's cover a few general concepts and then outline the differences between the two modes. + +== Cluster Concepts + +=== Servers and Nodes + +A _server_ is the hardware or virtual machine that hosts Solr software. +A _node_ is an instance of a running Solr process that services search and indexing requests. +Large servers may run multiple Solr nodes, though typically one node per server is most common. + +=== Shards + +In both cluster modes, a logical collection of documents can be divided across nodes as _shards_. +Each shard represents a logical slice of the overall collection and contains a subset of the documents. + +The number of shards determines the theoretical limit to the number of documents that can be stored. +It also dictates the amount of parallelization possible for an individual search request. + +=== Replicas + +A shard is a logical concept—a slice of your collection. +A _replica_ is the physical manifestation of that logical shard. +It is the actual running instance that holds and serves the documents belonging to that shard. + +A shard must have at least one replica to exist physically. +If you have one shard with one physical copy, you have one replica. +If you add redundancy by creating additional copies of that shard, you have multiple replicas—each is equally a replica, including the first one. + +IMPORTANT: There is no "original shard" separate from its replicas. +The replicas ARE how the shard exists. +This is why we say "a shard with 2 replicas" has 2 total physical copies, not an original plus 2 additional copies. + +All replicas of the same shard contain the same subset of documents and share the same configuration. + +The number of replicas determines the level of fault tolerance the cluster has in the event of a node failure. +It also dictates the theoretical limit on the number of concurrent search requests that can be processed under heavy load. + +=== Leaders and Followers + +Among the replicas for a given shard, one replica is designated as the _leader_. +The leader serves as the source-of-truth for its shard. +When document updates are made, they are first processed by the leader replica and then propagated to the other replicas (the exact mechanism varies by cluster mode). + +The replicas which are not leaders are called _followers_. + +=== Cores + +In Solr's implementation, each replica is represented as a _core_. +The term "core" is primarily an internal implementation detail—when you create a replica, Solr creates a core to represent it. +Multiple cores can be hosted on any one node. + +NOTE: The term "core" can be confusing because in everyday English it implies something central and singular, but in Solr it actually refers to one of potentially many replicas distributed across the cluster. +In most contexts, thinking of "core" as synonymous with "replica" will help clarify discussions about Solr's architecture. + +=== Collections and Indexes + +A _collection_ is the complete logical set of searchable documents that share a schema and configuration. +In SolrCloud mode (described below), a collection encompasses all the shards and their replicas. + +An _index_ refers to the physical data structures written to disk by Apache Lucene. +Each core (replica) maintains exactly one Lucene index on disk, containing the actual inverted indexes, stored fields, and other data structures that enable search. + +This creates a clear hierarchy from logical concepts to physical storage: + +[source,text] +---- +Collection (logical grouping of all searchable documents) + └─> Shard 1 (logical partition) + │ └─> Replica 1 / Core 1 (physical instance) + │ │ └─> Lucene Index (disk structures) + │ └─> Replica 2 / Core 2 (physical instance) + │ └─> Lucene Index (disk structures) + └─> Shard 2 (logical partition) + └─> Replica 1 / Core 3 (physical instance) + │ └─> Lucene Index (disk structures) + └─> Replica 2 / Core 4 (physical instance) + └─> Lucene Index (disk structures) +---- + +In this example, a collection is divided into 2 shards, each shard has 2 replicas for redundancy, and each replica maintains its own Lucene index on disk. + +== SolrCloud Mode + +SolrCloud mode (also called "SolrCloud") uses Apache ZooKeeper to provide the centralized cluster management that is its main feature. +ZooKeeper tracks each node of the cluster and the state of each core on each node. + +In this mode, configuration files are stored in ZooKeeper and not on the file system of each node. Review Comment: ```suggestion In this mode, configuration files (a "Configset") are stored in ZooKeeper and not on the file system of each node. Like most things in Solr, this choice is configurable. ``` ########## solr/solr-ref-guide/modules/getting-started/pages/cluster-types.adoc: ########## @@ -0,0 +1,158 @@ += Solr Cluster Types +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +A Solr cluster is a group of servers that each run one or more Solr _nodes_. + +There are two general modes of operating a cluster of Solr nodes. +One mode provides central coordination of the Solr nodes (<<SolrCloud Mode>>), while the other allows you to operate a cluster without this central coordination (<<User-Managed Mode>>). + +TIP: "User Managed" and "Single Node" are sometimes referred to as "Standalone", especially in source code. + +Both modes share general concepts, but ultimately differ in how those concepts are reflected in functionality and features. + +First let's cover a few general concepts and then outline the differences between the two modes. + +== Cluster Concepts + +=== Servers and Nodes + +A _server_ is the hardware or virtual machine that hosts Solr software. +A _node_ is an instance of a running Solr process that services search and indexing requests. +Large servers may run multiple Solr nodes, though typically one node per server is most common. Review Comment: ```suggestion Although servers may run multiple nodes, it makes more sense to avoid that. ``` ########## solr/solr-ref-guide/modules/getting-started/pages/cluster-types.adoc: ########## @@ -0,0 +1,158 @@ += Solr Cluster Types +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +A Solr cluster is a group of servers that each run one or more Solr _nodes_. + +There are two general modes of operating a cluster of Solr nodes. +One mode provides central coordination of the Solr nodes (<<SolrCloud Mode>>), while the other allows you to operate a cluster without this central coordination (<<User-Managed Mode>>). + +TIP: "User Managed" and "Single Node" are sometimes referred to as "Standalone", especially in source code. + +Both modes share general concepts, but ultimately differ in how those concepts are reflected in functionality and features. + +First let's cover a few general concepts and then outline the differences between the two modes. + +== Cluster Concepts + +=== Servers and Nodes + +A _server_ is the hardware or virtual machine that hosts Solr software. +A _node_ is an instance of a running Solr process that services search and indexing requests. +Large servers may run multiple Solr nodes, though typically one node per server is most common. + +=== Shards + +In both cluster modes, a logical collection of documents can be divided across nodes as _shards_. +Each shard represents a logical slice of the overall collection and contains a subset of the documents. + +The number of shards determines the theoretical limit to the number of documents that can be stored. +It also dictates the amount of parallelization possible for an individual search request. + +=== Replicas + +A shard is a logical concept—a slice of your collection. +A _replica_ is the physical manifestation of that logical shard. +It is the actual running instance that holds and serves the documents belonging to that shard. + +A shard must have at least one replica to exist physically. +If you have one shard with one physical copy, you have one replica. +If you add redundancy by creating additional copies of that shard, you have multiple replicas—each is equally a replica, including the first one. + +IMPORTANT: There is no "original shard" separate from its replicas. +The replicas ARE how the shard exists. +This is why we say "a shard with 2 replicas" has 2 total physical copies, not an original plus 2 additional copies. + +All replicas of the same shard contain the same subset of documents and share the same configuration. + +The number of replicas determines the level of fault tolerance the cluster has in the event of a node failure. +It also dictates the theoretical limit on the number of concurrent search requests that can be processed under heavy load. + +=== Leaders and Followers + +Among the replicas for a given shard, one replica is designated as the _leader_. +The leader serves as the source-of-truth for its shard. +When document updates are made, they are first processed by the leader replica and then propagated to the other replicas (the exact mechanism varies by cluster mode). + +The replicas which are not leaders are called _followers_. + +=== Cores + +In Solr's implementation, each replica is represented as a _core_. +The term "core" is primarily an internal implementation detail—when you create a replica, Solr creates a core to represent it. +Multiple cores can be hosted on any one node. + +NOTE: The term "core" can be confusing because in everyday English it implies something central and singular, but in Solr it actually refers to one of potentially many replicas distributed across the cluster. +In most contexts, thinking of "core" as synonymous with "replica" will help clarify discussions about Solr's architecture. + +=== Collections and Indexes + +A _collection_ is the complete logical set of searchable documents that share a schema and configuration. +In SolrCloud mode (described below), a collection encompasses all the shards and their replicas. + +An _index_ refers to the physical data structures written to disk by Apache Lucene. +Each core (replica) maintains exactly one Lucene index on disk, containing the actual inverted indexes, stored fields, and other data structures that enable search. + +This creates a clear hierarchy from logical concepts to physical storage: + +[source,text] +---- +Collection (logical grouping of all searchable documents) + └─> Shard 1 (logical partition) + │ └─> Replica 1 / Core 1 (physical instance) + │ │ └─> Lucene Index (disk structures) + │ └─> Replica 2 / Core 2 (physical instance) + │ └─> Lucene Index (disk structures) + └─> Shard 2 (logical partition) + └─> Replica 1 / Core 3 (physical instance) + │ └─> Lucene Index (disk structures) + └─> Replica 2 / Core 4 (physical instance) + └─> Lucene Index (disk structures) +---- + +In this example, a collection is divided into 2 shards, each shard has 2 replicas for redundancy, and each replica maintains its own Lucene index on disk. + +== SolrCloud Mode + +SolrCloud mode (also called "SolrCloud") uses Apache ZooKeeper to provide the centralized cluster management that is its main feature. +ZooKeeper tracks each node of the cluster and the state of each core on each node. + +In this mode, configuration files are stored in ZooKeeper and not on the file system of each node. +When configuration changes are made, they must be uploaded to ZooKeeper, which in turn makes sure each node knows changes have been made. + +SolrCloud manages collections as first-class entities. +A collection represents the entire group of shards and replicas that together provide access to a corpus of documents. +Collections share the same configurations (schema, `solrconfig.xml`, etc.). +This centralization of cluster management means that operations can be performed on the entire collection at one time. + +When changes are made to configurations, a single command to reload the collection will automatically reload each individual core (replica) that is a member of the collection. + +Sharding is handled automatically, simply by telling Solr during collection creation how many shards you'd like the collection to have. +Document updates are then generally balanced between each shard automatically. +Some degree of control over what documents are stored in which shards is also available, if needed. + +ZooKeeper also handles load balancing and failover. +Incoming requests, either to index documents or for user queries, can be sent to any node of the cluster and ZooKeeper will route the request to an appropriate replica of each shard. Review Comment: ZooKeeper doesn't route shit and generally doesn't "do" anything nor make any decisions. This was a very early misunderstanding in my exploration of SolrCloud forever ago. ZooKeeper just holds information. SolrCloud "routes", and makes decisions (aided by ZK). ########## solr/solr-ref-guide/modules/getting-started/pages/cluster-types.adoc: ########## @@ -0,0 +1,158 @@ += Solr Cluster Types +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +A Solr cluster is a group of servers that each run one or more Solr _nodes_. + +There are two general modes of operating a cluster of Solr nodes. +One mode provides central coordination of the Solr nodes (<<SolrCloud Mode>>), while the other allows you to operate a cluster without this central coordination (<<User-Managed Mode>>). + +TIP: "User Managed" and "Single Node" are sometimes referred to as "Standalone", especially in source code. + +Both modes share general concepts, but ultimately differ in how those concepts are reflected in functionality and features. + +First let's cover a few general concepts and then outline the differences between the two modes. + +== Cluster Concepts + +=== Servers and Nodes + +A _server_ is the hardware or virtual machine that hosts Solr software. +A _node_ is an instance of a running Solr process that services search and indexing requests. +Large servers may run multiple Solr nodes, though typically one node per server is most common. + +=== Shards + +In both cluster modes, a logical collection of documents can be divided across nodes as _shards_. +Each shard represents a logical slice of the overall collection and contains a subset of the documents. + +The number of shards determines the theoretical limit to the number of documents that can be stored. +It also dictates the amount of parallelization possible for an individual search request. + +=== Replicas + +A shard is a logical concept—a slice of your collection. +A _replica_ is the physical manifestation of that logical shard. +It is the actual running instance that holds and serves the documents belonging to that shard. + +A shard must have at least one replica to exist physically. +If you have one shard with one physical copy, you have one replica. +If you add redundancy by creating additional copies of that shard, you have multiple replicas—each is equally a replica, including the first one. + +IMPORTANT: There is no "original shard" separate from its replicas. +The replicas ARE how the shard exists. +This is why we say "a shard with 2 replicas" has 2 total physical copies, not an original plus 2 additional copies. + +All replicas of the same shard contain the same subset of documents and share the same configuration. + +The number of replicas determines the level of fault tolerance the cluster has in the event of a node failure. +It also dictates the theoretical limit on the number of concurrent search requests that can be processed under heavy load. + +=== Leaders and Followers + +Among the replicas for a given shard, one replica is designated as the _leader_. +The leader serves as the source-of-truth for its shard. +When document updates are made, they are first processed by the leader replica and then propagated to the other replicas (the exact mechanism varies by cluster mode). + +The replicas which are not leaders are called _followers_. + +=== Cores + +In Solr's implementation, each replica is represented as a _core_. +The term "core" is primarily an internal implementation detail—when you create a replica, Solr creates a core to represent it. +Multiple cores can be hosted on any one node. + +NOTE: The term "core" can be confusing because in everyday English it implies something central and singular, but in Solr it actually refers to one of potentially many replicas distributed across the cluster. +In most contexts, thinking of "core" as synonymous with "replica" will help clarify discussions about Solr's architecture. + +=== Collections and Indexes + +A _collection_ is the complete logical set of searchable documents that share a schema and configuration. +In SolrCloud mode (described below), a collection encompasses all the shards and their replicas. + +An _index_ refers to the physical data structures written to disk by Apache Lucene. +Each core (replica) maintains exactly one Lucene index on disk, containing the actual inverted indexes, stored fields, and other data structures that enable search. + +This creates a clear hierarchy from logical concepts to physical storage: + +[source,text] +---- +Collection (logical grouping of all searchable documents) + └─> Shard 1 (logical partition) + │ └─> Replica 1 / Core 1 (physical instance) + │ │ └─> Lucene Index (disk structures) + │ └─> Replica 2 / Core 2 (physical instance) + │ └─> Lucene Index (disk structures) + └─> Shard 2 (logical partition) + └─> Replica 1 / Core 3 (physical instance) + │ └─> Lucene Index (disk structures) + └─> Replica 2 / Core 4 (physical instance) + └─> Lucene Index (disk structures) +---- + +In this example, a collection is divided into 2 shards, each shard has 2 replicas for redundancy, and each replica maintains its own Lucene index on disk. + +== SolrCloud Mode + +SolrCloud mode (also called "SolrCloud") uses Apache ZooKeeper to provide the centralized cluster management that is its main feature. +ZooKeeper tracks each node of the cluster and the state of each core on each node. + +In this mode, configuration files are stored in ZooKeeper and not on the file system of each node. +When configuration changes are made, they must be uploaded to ZooKeeper, which in turn makes sure each node knows changes have been made. + +SolrCloud manages collections as first-class entities. +A collection represents the entire group of shards and replicas that together provide access to a corpus of documents. +Collections share the same configurations (schema, `solrconfig.xml`, etc.). +This centralization of cluster management means that operations can be performed on the entire collection at one time. + +When changes are made to configurations, a single command to reload the collection will automatically reload each individual core (replica) that is a member of the collection. + +Sharding is handled automatically, simply by telling Solr during collection creation how many shards you'd like the collection to have. +Document updates are then generally balanced between each shard automatically. +Some degree of control over what documents are stored in which shards is also available, if needed. + +ZooKeeper also handles load balancing and failover. +Incoming requests, either to index documents or for user queries, can be sent to any node of the cluster and ZooKeeper will route the request to an appropriate replica of each shard. + +In SolrCloud, the leader is flexible, with built-in mechanisms for automatic leader election in case the current leader fails. +This means another replica can become the leader, and from that point forward it is the source-of-truth for all other replicas of that shard. + +As long as one replica of each relevant shard is available, a user query or indexing request can still be satisfied when running in SolrCloud mode. + +== User-Managed Mode + +Solr's user-managed mode requires that cluster coordination activities that SolrCloud normally uses ZooKeeper for be performed manually or with local scripts. + +If the corpus of documents is too large for a single shard, the logic to create multiple shards is entirely left to the user. +There are no automated or programmatic ways for Solr to create shards during indexing. + +Routing documents to shards is handled manually, either with a simple hashing system or a simple round-robin list of shards that sends each document to a different shard. +Document updates must be sent to the right shard or duplicate documents could result. + +In user-managed mode, the distinction between leader and follower replicas becomes critical. +Identifying which node will host the leader replica and which host(s) will have follower replicas dictates how each node is configured. +In this mode, all document updates are sent to the leader replica only. +Once the leader has completed indexing, each follower replica will request the index updates and copy them from the leader. + +Load balancing is achieved with an external tool or process, unless request traffic can be managed by the leader or one of its follower replicas alone. + +If the leader replica goes down, there is no built-in failover mechanism. +A follower replica could continue to serve queries if the queries were specifically directed to it. +Promoting a follower replica to serve as the leader would require changing `solrconfig.xml` configurations on all replicas and reloading each core. + +User-managed mode has no concept of a collection as a managed entity, so for all intents and purposes each Solr core is configured and managed independently. Review Comment: This sentence is absolutely key, and the next good too. Let's elevate them to the top of "user managed clusters". ########## solr/solr-ref-guide/modules/getting-started/pages/cluster-types.adoc: ########## @@ -0,0 +1,158 @@ += Solr Cluster Types +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +A Solr cluster is a group of servers that each run one or more Solr _nodes_. + +There are two general modes of operating a cluster of Solr nodes. +One mode provides central coordination of the Solr nodes (<<SolrCloud Mode>>), while the other allows you to operate a cluster without this central coordination (<<User-Managed Mode>>). + +TIP: "User Managed" and "Single Node" are sometimes referred to as "Standalone", especially in source code. + +Both modes share general concepts, but ultimately differ in how those concepts are reflected in functionality and features. + +First let's cover a few general concepts and then outline the differences between the two modes. + +== Cluster Concepts + +=== Servers and Nodes + +A _server_ is the hardware or virtual machine that hosts Solr software. +A _node_ is an instance of a running Solr process that services search and indexing requests. +Large servers may run multiple Solr nodes, though typically one node per server is most common. + +=== Shards + +In both cluster modes, a logical collection of documents can be divided across nodes as _shards_. +Each shard represents a logical slice of the overall collection and contains a subset of the documents. + +The number of shards determines the theoretical limit to the number of documents that can be stored. +It also dictates the amount of parallelization possible for an individual search request. + +=== Replicas + +A shard is a logical concept—a slice of your collection. +A _replica_ is the physical manifestation of that logical shard. +It is the actual running instance that holds and serves the documents belonging to that shard. + +A shard must have at least one replica to exist physically. +If you have one shard with one physical copy, you have one replica. +If you add redundancy by creating additional copies of that shard, you have multiple replicas—each is equally a replica, including the first one. + +IMPORTANT: There is no "original shard" separate from its replicas. +The replicas ARE how the shard exists. +This is why we say "a shard with 2 replicas" has 2 total physical copies, not an original plus 2 additional copies. + +All replicas of the same shard contain the same subset of documents and share the same configuration. + +The number of replicas determines the level of fault tolerance the cluster has in the event of a node failure. +It also dictates the theoretical limit on the number of concurrent search requests that can be processed under heavy load. + +=== Leaders and Followers + +Among the replicas for a given shard, one replica is designated as the _leader_. +The leader serves as the source-of-truth for its shard. +When document updates are made, they are first processed by the leader replica and then propagated to the other replicas (the exact mechanism varies by cluster mode). + +The replicas which are not leaders are called _followers_. + +=== Cores + +In Solr's implementation, each replica is represented as a _core_. +The term "core" is primarily an internal implementation detail—when you create a replica, Solr creates a core to represent it. +Multiple cores can be hosted on any one node. + +NOTE: The term "core" can be confusing because in everyday English it implies something central and singular, but in Solr it actually refers to one of potentially many replicas distributed across the cluster. +In most contexts, thinking of "core" as synonymous with "replica" will help clarify discussions about Solr's architecture. + +=== Collections and Indexes + +A _collection_ is the complete logical set of searchable documents that share a schema and configuration. +In SolrCloud mode (described below), a collection encompasses all the shards and their replicas. + +An _index_ refers to the physical data structures written to disk by Apache Lucene. +Each core (replica) maintains exactly one Lucene index on disk, containing the actual inverted indexes, stored fields, and other data structures that enable search. + +This creates a clear hierarchy from logical concepts to physical storage: + +[source,text] +---- +Collection (logical grouping of all searchable documents) + └─> Shard 1 (logical partition) + │ └─> Replica 1 / Core 1 (physical instance) + │ │ └─> Lucene Index (disk structures) + │ └─> Replica 2 / Core 2 (physical instance) + │ └─> Lucene Index (disk structures) + └─> Shard 2 (logical partition) + └─> Replica 1 / Core 3 (physical instance) + │ └─> Lucene Index (disk structures) + └─> Replica 2 / Core 4 (physical instance) + └─> Lucene Index (disk structures) +---- + +In this example, a collection is divided into 2 shards, each shard has 2 replicas for redundancy, and each replica maintains its own Lucene index on disk. + +== SolrCloud Mode + +SolrCloud mode (also called "SolrCloud") uses Apache ZooKeeper to provide the centralized cluster management that is its main feature. +ZooKeeper tracks each node of the cluster and the state of each core on each node. + +In this mode, configuration files are stored in ZooKeeper and not on the file system of each node. +When configuration changes are made, they must be uploaded to ZooKeeper, which in turn makes sure each node knows changes have been made. + +SolrCloud manages collections as first-class entities. +A collection represents the entire group of shards and replicas that together provide access to a corpus of documents. +Collections share the same configurations (schema, `solrconfig.xml`, etc.). +This centralization of cluster management means that operations can be performed on the entire collection at one time. + +When changes are made to configurations, a single command to reload the collection will automatically reload each individual core (replica) that is a member of the collection. + +Sharding is handled automatically, simply by telling Solr during collection creation how many shards you'd like the collection to have. +Document updates are then generally balanced between each shard automatically. +Some degree of control over what documents are stored in which shards is also available, if needed. + +ZooKeeper also handles load balancing and failover. Review Comment: Huh? I suggest dropping this statement. ########## solr/solr-ref-guide/modules/getting-started/pages/cluster-types.adoc: ########## @@ -0,0 +1,158 @@ += Solr Cluster Types +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +A Solr cluster is a group of servers that each run one or more Solr _nodes_. + +There are two general modes of operating a cluster of Solr nodes. +One mode provides central coordination of the Solr nodes (<<SolrCloud Mode>>), while the other allows you to operate a cluster without this central coordination (<<User-Managed Mode>>). + +TIP: "User Managed" and "Single Node" are sometimes referred to as "Standalone", especially in source code. + +Both modes share general concepts, but ultimately differ in how those concepts are reflected in functionality and features. + +First let's cover a few general concepts and then outline the differences between the two modes. + +== Cluster Concepts + +=== Servers and Nodes + +A _server_ is the hardware or virtual machine that hosts Solr software. +A _node_ is an instance of a running Solr process that services search and indexing requests. +Large servers may run multiple Solr nodes, though typically one node per server is most common. + +=== Shards + +In both cluster modes, a logical collection of documents can be divided across nodes as _shards_. +Each shard represents a logical slice of the overall collection and contains a subset of the documents. + +The number of shards determines the theoretical limit to the number of documents that can be stored. +It also dictates the amount of parallelization possible for an individual search request. + +=== Replicas + +A shard is a logical concept—a slice of your collection. +A _replica_ is the physical manifestation of that logical shard. +It is the actual running instance that holds and serves the documents belonging to that shard. + +A shard must have at least one replica to exist physically. +If you have one shard with one physical copy, you have one replica. +If you add redundancy by creating additional copies of that shard, you have multiple replicas—each is equally a replica, including the first one. + +IMPORTANT: There is no "original shard" separate from its replicas. +The replicas ARE how the shard exists. +This is why we say "a shard with 2 replicas" has 2 total physical copies, not an original plus 2 additional copies. + +All replicas of the same shard contain the same subset of documents and share the same configuration. + +The number of replicas determines the level of fault tolerance the cluster has in the event of a node failure. +It also dictates the theoretical limit on the number of concurrent search requests that can be processed under heavy load. + +=== Leaders and Followers + +Among the replicas for a given shard, one replica is designated as the _leader_. +The leader serves as the source-of-truth for its shard. +When document updates are made, they are first processed by the leader replica and then propagated to the other replicas (the exact mechanism varies by cluster mode). + +The replicas which are not leaders are called _followers_. + +=== Cores + +In Solr's implementation, each replica is represented as a _core_. +The term "core" is primarily an internal implementation detail—when you create a replica, Solr creates a core to represent it. +Multiple cores can be hosted on any one node. + +NOTE: The term "core" can be confusing because in everyday English it implies something central and singular, but in Solr it actually refers to one of potentially many replicas distributed across the cluster. +In most contexts, thinking of "core" as synonymous with "replica" will help clarify discussions about Solr's architecture. + +=== Collections and Indexes + +A _collection_ is the complete logical set of searchable documents that share a schema and configuration. +In SolrCloud mode (described below), a collection encompasses all the shards and their replicas. + +An _index_ refers to the physical data structures written to disk by Apache Lucene. +Each core (replica) maintains exactly one Lucene index on disk, containing the actual inverted indexes, stored fields, and other data structures that enable search. + +This creates a clear hierarchy from logical concepts to physical storage: + +[source,text] +---- +Collection (logical grouping of all searchable documents) + └─> Shard 1 (logical partition) + │ └─> Replica 1 / Core 1 (physical instance) + │ │ └─> Lucene Index (disk structures) + │ └─> Replica 2 / Core 2 (physical instance) + │ └─> Lucene Index (disk structures) + └─> Shard 2 (logical partition) + └─> Replica 1 / Core 3 (physical instance) + │ └─> Lucene Index (disk structures) + └─> Replica 2 / Core 4 (physical instance) + └─> Lucene Index (disk structures) +---- + +In this example, a collection is divided into 2 shards, each shard has 2 replicas for redundancy, and each replica maintains its own Lucene index on disk. + +== SolrCloud Mode + +SolrCloud mode (also called "SolrCloud") uses Apache ZooKeeper to provide the centralized cluster management that is its main feature. +ZooKeeper tracks each node of the cluster and the state of each core on each node. + +In this mode, configuration files are stored in ZooKeeper and not on the file system of each node. +When configuration changes are made, they must be uploaded to ZooKeeper, which in turn makes sure each node knows changes have been made. + +SolrCloud manages collections as first-class entities. +A collection represents the entire group of shards and replicas that together provide access to a corpus of documents. +Collections share the same configurations (schema, `solrconfig.xml`, etc.). +This centralization of cluster management means that operations can be performed on the entire collection at one time. + +When changes are made to configurations, a single command to reload the collection will automatically reload each individual core (replica) that is a member of the collection. + +Sharding is handled automatically, simply by telling Solr during collection creation how many shards you'd like the collection to have. +Document updates are then generally balanced between each shard automatically. +Some degree of control over what documents are stored in which shards is also available, if needed. + +ZooKeeper also handles load balancing and failover. +Incoming requests, either to index documents or for user queries, can be sent to any node of the cluster and ZooKeeper will route the request to an appropriate replica of each shard. + +In SolrCloud, the leader is flexible, with built-in mechanisms for automatic leader election in case the current leader fails. +This means another replica can become the leader, and from that point forward it is the source-of-truth for all other replicas of that shard. + +As long as one replica of each relevant shard is available, a user query or indexing request can still be satisfied when running in SolrCloud mode. + +== User-Managed Mode Review Comment: ```suggestion == User-Managed Clusters ``` ########## solr/solr-ref-guide/modules/getting-started/pages/cluster-types.adoc: ########## @@ -0,0 +1,158 @@ += Solr Cluster Types +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +A Solr cluster is a group of servers that each run one or more Solr _nodes_. + +There are two general modes of operating a cluster of Solr nodes. +One mode provides central coordination of the Solr nodes (<<SolrCloud Mode>>), while the other allows you to operate a cluster without this central coordination (<<User-Managed Mode>>). + +TIP: "User Managed" and "Single Node" are sometimes referred to as "Standalone", especially in source code. Review Comment: I would further state we don't need to talk about "single node" as if it's some kind of Solr term or deployment type. We already have a word for this, and we all know what that word is ;-) ########## solr/solr-ref-guide/modules/getting-started/pages/cluster-types.adoc: ########## @@ -0,0 +1,158 @@ += Solr Cluster Types +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +A Solr cluster is a group of servers that each run one or more Solr _nodes_. + +There are two general modes of operating a cluster of Solr nodes. +One mode provides central coordination of the Solr nodes (<<SolrCloud Mode>>), while the other allows you to operate a cluster without this central coordination (<<User-Managed Mode>>). + +TIP: "User Managed" and "Single Node" are sometimes referred to as "Standalone", especially in source code. + +Both modes share general concepts, but ultimately differ in how those concepts are reflected in functionality and features. + +First let's cover a few general concepts and then outline the differences between the two modes. + +== Cluster Concepts + +=== Servers and Nodes + +A _server_ is the hardware or virtual machine that hosts Solr software. +A _node_ is an instance of a running Solr process that services search and indexing requests. +Large servers may run multiple Solr nodes, though typically one node per server is most common. + +=== Shards + +In both cluster modes, a logical collection of documents can be divided across nodes as _shards_. +Each shard represents a logical slice of the overall collection and contains a subset of the documents. + +The number of shards determines the theoretical limit to the number of documents that can be stored. +It also dictates the amount of parallelization possible for an individual search request. + +=== Replicas + +A shard is a logical concept—a slice of your collection. +A _replica_ is the physical manifestation of that logical shard. +It is the actual running instance that holds and serves the documents belonging to that shard. + +A shard must have at least one replica to exist physically. +If you have one shard with one physical copy, you have one replica. +If you add redundancy by creating additional copies of that shard, you have multiple replicas—each is equally a replica, including the first one. + +IMPORTANT: There is no "original shard" separate from its replicas. +The replicas ARE how the shard exists. +This is why we say "a shard with 2 replicas" has 2 total physical copies, not an original plus 2 additional copies. + +All replicas of the same shard contain the same subset of documents and share the same configuration. + +The number of replicas determines the level of fault tolerance the cluster has in the event of a node failure. +It also dictates the theoretical limit on the number of concurrent search requests that can be processed under heavy load. + +=== Leaders and Followers + +Among the replicas for a given shard, one replica is designated as the _leader_. +The leader serves as the source-of-truth for its shard. +When document updates are made, they are first processed by the leader replica and then propagated to the other replicas (the exact mechanism varies by cluster mode). + +The replicas which are not leaders are called _followers_. + +=== Cores + +In Solr's implementation, each replica is represented as a _core_. +The term "core" is primarily an internal implementation detail—when you create a replica, Solr creates a core to represent it. +Multiple cores can be hosted on any one node. + +NOTE: The term "core" can be confusing because in everyday English it implies something central and singular, but in Solr it actually refers to one of potentially many replicas distributed across the cluster. +In most contexts, thinking of "core" as synonymous with "replica" will help clarify discussions about Solr's architecture. + +=== Collections and Indexes + +A _collection_ is the complete logical set of searchable documents that share a schema and configuration. +In SolrCloud mode (described below), a collection encompasses all the shards and their replicas. + +An _index_ refers to the physical data structures written to disk by Apache Lucene. +Each core (replica) maintains exactly one Lucene index on disk, containing the actual inverted indexes, stored fields, and other data structures that enable search. + +This creates a clear hierarchy from logical concepts to physical storage: + +[source,text] +---- +Collection (logical grouping of all searchable documents) + └─> Shard 1 (logical partition) + │ └─> Replica 1 / Core 1 (physical instance) + │ │ └─> Lucene Index (disk structures) + │ └─> Replica 2 / Core 2 (physical instance) + │ └─> Lucene Index (disk structures) + └─> Shard 2 (logical partition) + └─> Replica 1 / Core 3 (physical instance) + │ └─> Lucene Index (disk structures) + └─> Replica 2 / Core 4 (physical instance) + └─> Lucene Index (disk structures) +---- + +In this example, a collection is divided into 2 shards, each shard has 2 replicas for redundancy, and each replica maintains its own Lucene index on disk. + +== SolrCloud Mode + +SolrCloud mode (also called "SolrCloud") uses Apache ZooKeeper to provide the centralized cluster management that is its main feature. Review Comment: ```suggestion A SolrCloud cluster (or simply "SolrCloud") uses Apache ZooKeeper to provide the centralized cluster management that is its main feature. ``` ########## solr/solr-ref-guide/modules/getting-started/pages/cluster-types.adoc: ########## @@ -0,0 +1,158 @@ += Solr Cluster Types +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +A Solr cluster is a group of servers that each run one or more Solr _nodes_. + +There are two general modes of operating a cluster of Solr nodes. +One mode provides central coordination of the Solr nodes (<<SolrCloud Mode>>), while the other allows you to operate a cluster without this central coordination (<<User-Managed Mode>>). + +TIP: "User Managed" and "Single Node" are sometimes referred to as "Standalone", especially in source code. + +Both modes share general concepts, but ultimately differ in how those concepts are reflected in functionality and features. + +First let's cover a few general concepts and then outline the differences between the two modes. + +== Cluster Concepts + +=== Servers and Nodes + +A _server_ is the hardware or virtual machine that hosts Solr software. +A _node_ is an instance of a running Solr process that services search and indexing requests. +Large servers may run multiple Solr nodes, though typically one node per server is most common. + +=== Shards + +In both cluster modes, a logical collection of documents can be divided across nodes as _shards_. +Each shard represents a logical slice of the overall collection and contains a subset of the documents. + +The number of shards determines the theoretical limit to the number of documents that can be stored. +It also dictates the amount of parallelization possible for an individual search request. + +=== Replicas + +A shard is a logical concept—a slice of your collection. +A _replica_ is the physical manifestation of that logical shard. +It is the actual running instance that holds and serves the documents belonging to that shard. + +A shard must have at least one replica to exist physically. +If you have one shard with one physical copy, you have one replica. +If you add redundancy by creating additional copies of that shard, you have multiple replicas—each is equally a replica, including the first one. + +IMPORTANT: There is no "original shard" separate from its replicas. +The replicas ARE how the shard exists. +This is why we say "a shard with 2 replicas" has 2 total physical copies, not an original plus 2 additional copies. + +All replicas of the same shard contain the same subset of documents and share the same configuration. + +The number of replicas determines the level of fault tolerance the cluster has in the event of a node failure. +It also dictates the theoretical limit on the number of concurrent search requests that can be processed under heavy load. + +=== Leaders and Followers + +Among the replicas for a given shard, one replica is designated as the _leader_. +The leader serves as the source-of-truth for its shard. +When document updates are made, they are first processed by the leader replica and then propagated to the other replicas (the exact mechanism varies by cluster mode). + +The replicas which are not leaders are called _followers_. + +=== Cores + +In Solr's implementation, each replica is represented as a _core_. +The term "core" is primarily an internal implementation detail—when you create a replica, Solr creates a core to represent it. +Multiple cores can be hosted on any one node. + +NOTE: The term "core" can be confusing because in everyday English it implies something central and singular, but in Solr it actually refers to one of potentially many replicas distributed across the cluster. +In most contexts, thinking of "core" as synonymous with "replica" will help clarify discussions about Solr's architecture. + +=== Collections and Indexes + +A _collection_ is the complete logical set of searchable documents that share a schema and configuration. +In SolrCloud mode (described below), a collection encompasses all the shards and their replicas. + +An _index_ refers to the physical data structures written to disk by Apache Lucene. +Each core (replica) maintains exactly one Lucene index on disk, containing the actual inverted indexes, stored fields, and other data structures that enable search. + +This creates a clear hierarchy from logical concepts to physical storage: + +[source,text] +---- +Collection (logical grouping of all searchable documents) + └─> Shard 1 (logical partition) + │ └─> Replica 1 / Core 1 (physical instance) + │ │ └─> Lucene Index (disk structures) + │ └─> Replica 2 / Core 2 (physical instance) + │ └─> Lucene Index (disk structures) + └─> Shard 2 (logical partition) + └─> Replica 1 / Core 3 (physical instance) + │ └─> Lucene Index (disk structures) + └─> Replica 2 / Core 4 (physical instance) + └─> Lucene Index (disk structures) +---- + +In this example, a collection is divided into 2 shards, each shard has 2 replicas for redundancy, and each replica maintains its own Lucene index on disk. + +== SolrCloud Mode + +SolrCloud mode (also called "SolrCloud") uses Apache ZooKeeper to provide the centralized cluster management that is its main feature. +ZooKeeper tracks each node of the cluster and the state of each core on each node. + +In this mode, configuration files are stored in ZooKeeper and not on the file system of each node. +When configuration changes are made, they must be uploaded to ZooKeeper, which in turn makes sure each node knows changes have been made. + +SolrCloud manages collections as first-class entities. +A collection represents the entire group of shards and replicas that together provide access to a corpus of documents. +Collections share the same configurations (schema, `solrconfig.xml`, etc.). +This centralization of cluster management means that operations can be performed on the entire collection at one time. + +When changes are made to configurations, a single command to reload the collection will automatically reload each individual core (replica) that is a member of the collection. + +Sharding is handled automatically, simply by telling Solr during collection creation how many shards you'd like the collection to have. Review Comment: I'm not sure what to think of the statement "Sharding is handled automatically". This sentence fragment could be dropped. ########## solr/solr-ref-guide/modules/getting-started/pages/cluster-types.adoc: ########## @@ -0,0 +1,158 @@ += Solr Cluster Types +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +A Solr cluster is a group of servers that each run one or more Solr _nodes_. + +There are two general modes of operating a cluster of Solr nodes. +One mode provides central coordination of the Solr nodes (<<SolrCloud Mode>>), while the other allows you to operate a cluster without this central coordination (<<User-Managed Mode>>). + +TIP: "User Managed" and "Single Node" are sometimes referred to as "Standalone", especially in source code. Review Comment: Not just in source code but very much publicly as well. I would remove the "especially" part. ########## solr/solr-ref-guide/modules/getting-started/pages/cluster-types.adoc: ########## @@ -0,0 +1,158 @@ += Solr Cluster Types +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +A Solr cluster is a group of servers that each run one or more Solr _nodes_. + +There are two general modes of operating a cluster of Solr nodes. +One mode provides central coordination of the Solr nodes (<<SolrCloud Mode>>), while the other allows you to operate a cluster without this central coordination (<<User-Managed Mode>>). + +TIP: "User Managed" and "Single Node" are sometimes referred to as "Standalone", especially in source code. + +Both modes share general concepts, but ultimately differ in how those concepts are reflected in functionality and features. + +First let's cover a few general concepts and then outline the differences between the two modes. + +== Cluster Concepts + +=== Servers and Nodes + +A _server_ is the hardware or virtual machine that hosts Solr software. +A _node_ is an instance of a running Solr process that services search and indexing requests. +Large servers may run multiple Solr nodes, though typically one node per server is most common. + +=== Shards + +In both cluster modes, a logical collection of documents can be divided across nodes as _shards_. +Each shard represents a logical slice of the overall collection and contains a subset of the documents. + +The number of shards determines the theoretical limit to the number of documents that can be stored. +It also dictates the amount of parallelization possible for an individual search request. + +=== Replicas + +A shard is a logical concept—a slice of your collection. +A _replica_ is the physical manifestation of that logical shard. +It is the actual running instance that holds and serves the documents belonging to that shard. + +A shard must have at least one replica to exist physically. +If you have one shard with one physical copy, you have one replica. +If you add redundancy by creating additional copies of that shard, you have multiple replicas—each is equally a replica, including the first one. + +IMPORTANT: There is no "original shard" separate from its replicas. +The replicas ARE how the shard exists. +This is why we say "a shard with 2 replicas" has 2 total physical copies, not an original plus 2 additional copies. + +All replicas of the same shard contain the same subset of documents and share the same configuration. + +The number of replicas determines the level of fault tolerance the cluster has in the event of a node failure. +It also dictates the theoretical limit on the number of concurrent search requests that can be processed under heavy load. + +=== Leaders and Followers + +Among the replicas for a given shard, one replica is designated as the _leader_. +The leader serves as the source-of-truth for its shard. +When document updates are made, they are first processed by the leader replica and then propagated to the other replicas (the exact mechanism varies by cluster mode). + +The replicas which are not leaders are called _followers_. + +=== Cores + +In Solr's implementation, each replica is represented as a _core_. +The term "core" is primarily an internal implementation detail—when you create a replica, Solr creates a core to represent it. +Multiple cores can be hosted on any one node. + +NOTE: The term "core" can be confusing because in everyday English it implies something central and singular, but in Solr it actually refers to one of potentially many replicas distributed across the cluster. +In most contexts, thinking of "core" as synonymous with "replica" will help clarify discussions about Solr's architecture. + +=== Collections and Indexes + +A _collection_ is the complete logical set of searchable documents that share a schema and configuration. +In SolrCloud mode (described below), a collection encompasses all the shards and their replicas. + +An _index_ refers to the physical data structures written to disk by Apache Lucene. +Each core (replica) maintains exactly one Lucene index on disk, containing the actual inverted indexes, stored fields, and other data structures that enable search. + +This creates a clear hierarchy from logical concepts to physical storage: + +[source,text] +---- +Collection (logical grouping of all searchable documents) + └─> Shard 1 (logical partition) + │ └─> Replica 1 / Core 1 (physical instance) + │ │ └─> Lucene Index (disk structures) + │ └─> Replica 2 / Core 2 (physical instance) + │ └─> Lucene Index (disk structures) + └─> Shard 2 (logical partition) + └─> Replica 1 / Core 3 (physical instance) + │ └─> Lucene Index (disk structures) + └─> Replica 2 / Core 4 (physical instance) + └─> Lucene Index (disk structures) +---- + +In this example, a collection is divided into 2 shards, each shard has 2 replicas for redundancy, and each replica maintains its own Lucene index on disk. + +== SolrCloud Mode + +SolrCloud mode (also called "SolrCloud") uses Apache ZooKeeper to provide the centralized cluster management that is its main feature. +ZooKeeper tracks each node of the cluster and the state of each core on each node. + +In this mode, configuration files are stored in ZooKeeper and not on the file system of each node. +When configuration changes are made, they must be uploaded to ZooKeeper, which in turn makes sure each node knows changes have been made. + +SolrCloud manages collections as first-class entities. +A collection represents the entire group of shards and replicas that together provide access to a corpus of documents. +Collections share the same configurations (schema, `solrconfig.xml`, etc.). +This centralization of cluster management means that operations can be performed on the entire collection at one time. + +When changes are made to configurations, a single command to reload the collection will automatically reload each individual core (replica) that is a member of the collection. + +Sharding is handled automatically, simply by telling Solr during collection creation how many shards you'd like the collection to have. +Document updates are then generally balanced between each shard automatically. +Some degree of control over what documents are stored in which shards is also available, if needed. + +ZooKeeper also handles load balancing and failover. +Incoming requests, either to index documents or for user queries, can be sent to any node of the cluster and ZooKeeper will route the request to an appropriate replica of each shard. + +In SolrCloud, the leader is flexible, with built-in mechanisms for automatic leader election in case the current leader fails. +This means another replica can become the leader, and from that point forward it is the source-of-truth for all other replicas of that shard. + +As long as one replica of each relevant shard is available, a user query or indexing request can still be satisfied when running in SolrCloud mode. + +== User-Managed Mode + +Solr's user-managed mode requires that cluster coordination activities that SolrCloud normally uses ZooKeeper for be performed manually or with local scripts. + +If the corpus of documents is too large for a single shard, the logic to create multiple shards is entirely left to the user. +There are no automated or programmatic ways for Solr to create shards during indexing. Review Comment: This sentence could be dropped. The "during indexing" addition kind of confuses me... is a point trying to be made about Solr specifically during indexing? ########## solr/solr-ref-guide/modules/getting-started/pages/cluster-types.adoc: ########## @@ -0,0 +1,158 @@ += Solr Cluster Types +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +A Solr cluster is a group of servers that each run one or more Solr _nodes_. + +There are two general modes of operating a cluster of Solr nodes. +One mode provides central coordination of the Solr nodes (<<SolrCloud Mode>>), while the other allows you to operate a cluster without this central coordination (<<User-Managed Mode>>). + +TIP: "User Managed" and "Single Node" are sometimes referred to as "Standalone", especially in source code. + +Both modes share general concepts, but ultimately differ in how those concepts are reflected in functionality and features. + +First let's cover a few general concepts and then outline the differences between the two modes. + +== Cluster Concepts + +=== Servers and Nodes + +A _server_ is the hardware or virtual machine that hosts Solr software. +A _node_ is an instance of a running Solr process that services search and indexing requests. +Large servers may run multiple Solr nodes, though typically one node per server is most common. + +=== Shards + +In both cluster modes, a logical collection of documents can be divided across nodes as _shards_. +Each shard represents a logical slice of the overall collection and contains a subset of the documents. + +The number of shards determines the theoretical limit to the number of documents that can be stored. +It also dictates the amount of parallelization possible for an individual search request. + +=== Replicas + +A shard is a logical concept—a slice of your collection. +A _replica_ is the physical manifestation of that logical shard. +It is the actual running instance that holds and serves the documents belonging to that shard. + +A shard must have at least one replica to exist physically. +If you have one shard with one physical copy, you have one replica. +If you add redundancy by creating additional copies of that shard, you have multiple replicas—each is equally a replica, including the first one. + +IMPORTANT: There is no "original shard" separate from its replicas. +The replicas ARE how the shard exists. +This is why we say "a shard with 2 replicas" has 2 total physical copies, not an original plus 2 additional copies. + +All replicas of the same shard contain the same subset of documents and share the same configuration. + +The number of replicas determines the level of fault tolerance the cluster has in the event of a node failure. +It also dictates the theoretical limit on the number of concurrent search requests that can be processed under heavy load. + +=== Leaders and Followers + +Among the replicas for a given shard, one replica is designated as the _leader_. +The leader serves as the source-of-truth for its shard. +When document updates are made, they are first processed by the leader replica and then propagated to the other replicas (the exact mechanism varies by cluster mode). + +The replicas which are not leaders are called _followers_. + +=== Cores + +In Solr's implementation, each replica is represented as a _core_. +The term "core" is primarily an internal implementation detail—when you create a replica, Solr creates a core to represent it. +Multiple cores can be hosted on any one node. + +NOTE: The term "core" can be confusing because in everyday English it implies something central and singular, but in Solr it actually refers to one of potentially many replicas distributed across the cluster. +In most contexts, thinking of "core" as synonymous with "replica" will help clarify discussions about Solr's architecture. + +=== Collections and Indexes + +A _collection_ is the complete logical set of searchable documents that share a schema and configuration. +In SolrCloud mode (described below), a collection encompasses all the shards and their replicas. + +An _index_ refers to the physical data structures written to disk by Apache Lucene. +Each core (replica) maintains exactly one Lucene index on disk, containing the actual inverted indexes, stored fields, and other data structures that enable search. + +This creates a clear hierarchy from logical concepts to physical storage: + +[source,text] +---- +Collection (logical grouping of all searchable documents) + └─> Shard 1 (logical partition) + │ └─> Replica 1 / Core 1 (physical instance) + │ │ └─> Lucene Index (disk structures) + │ └─> Replica 2 / Core 2 (physical instance) + │ └─> Lucene Index (disk structures) + └─> Shard 2 (logical partition) + └─> Replica 1 / Core 3 (physical instance) + │ └─> Lucene Index (disk structures) + └─> Replica 2 / Core 4 (physical instance) + └─> Lucene Index (disk structures) +---- + +In this example, a collection is divided into 2 shards, each shard has 2 replicas for redundancy, and each replica maintains its own Lucene index on disk. + +== SolrCloud Mode + +SolrCloud mode (also called "SolrCloud") uses Apache ZooKeeper to provide the centralized cluster management that is its main feature. +ZooKeeper tracks each node of the cluster and the state of each core on each node. + +In this mode, configuration files are stored in ZooKeeper and not on the file system of each node. +When configuration changes are made, they must be uploaded to ZooKeeper, which in turn makes sure each node knows changes have been made. + +SolrCloud manages collections as first-class entities. +A collection represents the entire group of shards and replicas that together provide access to a corpus of documents. +Collections share the same configurations (schema, `solrconfig.xml`, etc.). +This centralization of cluster management means that operations can be performed on the entire collection at one time. + +When changes are made to configurations, a single command to reload the collection will automatically reload each individual core (replica) that is a member of the collection. Review Comment: ```suggestion When changes are made to configurations, a single command to "reload" the collection will automatically re-open each individual core (replica) that is a member of the collection with the latest configuration. ``` I hate Solr's choice of the word "reload". Confusing! ########## solr/solr-ref-guide/modules/deployment-guide/pages/cloud-screens.adoc: ########## @@ -21,7 +21,7 @@ This screen provides status information about each collection & node in your clu .Only Visible When using SolrCloud [NOTE] ==== -The "Cloud" menu option is only available when Solr is running xref:cluster-types.adoc#solrcloud-mode[SolrCloud]. +The "Cloud" menu option is only available when Solr is running xref:getting-started:cluster-types.adoc#solrcloud-mode[SolrCloud]. User-managed clusters or single-node installations will not display this option. Review Comment: again; use "standalone" ########## solr/solr-ref-guide/modules/getting-started/pages/cluster-types.adoc: ########## @@ -0,0 +1,158 @@ += Solr Cluster Types +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +A Solr cluster is a group of servers that each run one or more Solr _nodes_. + +There are two general modes of operating a cluster of Solr nodes. +One mode provides central coordination of the Solr nodes (<<SolrCloud Mode>>), while the other allows you to operate a cluster without this central coordination (<<User-Managed Mode>>). + +TIP: "User Managed" and "Single Node" are sometimes referred to as "Standalone", especially in source code. + +Both modes share general concepts, but ultimately differ in how those concepts are reflected in functionality and features. + +First let's cover a few general concepts and then outline the differences between the two modes. + +== Cluster Concepts + +=== Servers and Nodes + +A _server_ is the hardware or virtual machine that hosts Solr software. +A _node_ is an instance of a running Solr process that services search and indexing requests. +Large servers may run multiple Solr nodes, though typically one node per server is most common. + +=== Shards + +In both cluster modes, a logical collection of documents can be divided across nodes as _shards_. +Each shard represents a logical slice of the overall collection and contains a subset of the documents. + +The number of shards determines the theoretical limit to the number of documents that can be stored. +It also dictates the amount of parallelization possible for an individual search request. + +=== Replicas + +A shard is a logical concept—a slice of your collection. +A _replica_ is the physical manifestation of that logical shard. +It is the actual running instance that holds and serves the documents belonging to that shard. + +A shard must have at least one replica to exist physically. +If you have one shard with one physical copy, you have one replica. +If you add redundancy by creating additional copies of that shard, you have multiple replicas—each is equally a replica, including the first one. + +IMPORTANT: There is no "original shard" separate from its replicas. +The replicas ARE how the shard exists. +This is why we say "a shard with 2 replicas" has 2 total physical copies, not an original plus 2 additional copies. + +All replicas of the same shard contain the same subset of documents and share the same configuration. + +The number of replicas determines the level of fault tolerance the cluster has in the event of a node failure. +It also dictates the theoretical limit on the number of concurrent search requests that can be processed under heavy load. + +=== Leaders and Followers + +Among the replicas for a given shard, one replica is designated as the _leader_. +The leader serves as the source-of-truth for its shard. +When document updates are made, they are first processed by the leader replica and then propagated to the other replicas (the exact mechanism varies by cluster mode). + +The replicas which are not leaders are called _followers_. + +=== Cores + +In Solr's implementation, each replica is represented as a _core_. +The term "core" is primarily an internal implementation detail—when you create a replica, Solr creates a core to represent it. +Multiple cores can be hosted on any one node. + +NOTE: The term "core" can be confusing because in everyday English it implies something central and singular, but in Solr it actually refers to one of potentially many replicas distributed across the cluster. +In most contexts, thinking of "core" as synonymous with "replica" will help clarify discussions about Solr's architecture. + +=== Collections and Indexes + +A _collection_ is the complete logical set of searchable documents that share a schema and configuration. +In SolrCloud mode (described below), a collection encompasses all the shards and their replicas. + +An _index_ refers to the physical data structures written to disk by Apache Lucene. +Each core (replica) maintains exactly one Lucene index on disk, containing the actual inverted indexes, stored fields, and other data structures that enable search. + +This creates a clear hierarchy from logical concepts to physical storage: + +[source,text] +---- +Collection (logical grouping of all searchable documents) + └─> Shard 1 (logical partition) + │ └─> Replica 1 / Core 1 (physical instance) + │ │ └─> Lucene Index (disk structures) + │ └─> Replica 2 / Core 2 (physical instance) + │ └─> Lucene Index (disk structures) + └─> Shard 2 (logical partition) + └─> Replica 1 / Core 3 (physical instance) + │ └─> Lucene Index (disk structures) + └─> Replica 2 / Core 4 (physical instance) + └─> Lucene Index (disk structures) +---- + +In this example, a collection is divided into 2 shards, each shard has 2 replicas for redundancy, and each replica maintains its own Lucene index on disk. + +== SolrCloud Mode + +SolrCloud mode (also called "SolrCloud") uses Apache ZooKeeper to provide the centralized cluster management that is its main feature. +ZooKeeper tracks each node of the cluster and the state of each core on each node. + +In this mode, configuration files are stored in ZooKeeper and not on the file system of each node. +When configuration changes are made, they must be uploaded to ZooKeeper, which in turn makes sure each node knows changes have been made. + +SolrCloud manages collections as first-class entities. +A collection represents the entire group of shards and replicas that together provide access to a corpus of documents. +Collections share the same configurations (schema, `solrconfig.xml`, etc.). +This centralization of cluster management means that operations can be performed on the entire collection at one time. + +When changes are made to configurations, a single command to reload the collection will automatically reload each individual core (replica) that is a member of the collection. + +Sharding is handled automatically, simply by telling Solr during collection creation how many shards you'd like the collection to have. +Document updates are then generally balanced between each shard automatically. +Some degree of control over what documents are stored in which shards is also available, if needed. + +ZooKeeper also handles load balancing and failover. +Incoming requests, either to index documents or for user queries, can be sent to any node of the cluster and ZooKeeper will route the request to an appropriate replica of each shard. + +In SolrCloud, the leader is flexible, with built-in mechanisms for automatic leader election in case the current leader fails. +This means another replica can become the leader, and from that point forward it is the source-of-truth for all other replicas of that shard. + +As long as one replica of each relevant shard is available, a user query or indexing request can still be satisfied when running in SolrCloud mode. + +== User-Managed Mode + +Solr's user-managed mode requires that cluster coordination activities that SolrCloud normally uses ZooKeeper for be performed manually or with local scripts. Review Comment: ```suggestion Solr's user-managed cluster is nothing more than a set of Solr nodes in standalone mode, and thus know nothing of collections, shards, replicas, or ZooKeeper. Instead the "user" / operator (you) must do-it-yourself, both at client layers and probably some local scripts to automate whatever needs doing. The Core Admin APIs of Solr with some special core-level APIs like replication become the fundamental building blocks for you to design / build a system to your own specifications. This was what people did prior to SolrCloud, and some still do. ``` ########## solr/solr-ref-guide/modules/getting-started/pages/cluster-types.adoc: ########## @@ -0,0 +1,158 @@ += Solr Cluster Types +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +A Solr cluster is a group of servers that each run one or more Solr _nodes_. + +There are two general modes of operating a cluster of Solr nodes. +One mode provides central coordination of the Solr nodes (<<SolrCloud Mode>>), while the other allows you to operate a cluster without this central coordination (<<User-Managed Mode>>). + +TIP: "User Managed" and "Single Node" are sometimes referred to as "Standalone", especially in source code. + +Both modes share general concepts, but ultimately differ in how those concepts are reflected in functionality and features. + +First let's cover a few general concepts and then outline the differences between the two modes. + +== Cluster Concepts + +=== Servers and Nodes + +A _server_ is the hardware or virtual machine that hosts Solr software. +A _node_ is an instance of a running Solr process that services search and indexing requests. +Large servers may run multiple Solr nodes, though typically one node per server is most common. + +=== Shards + +In both cluster modes, a logical collection of documents can be divided across nodes as _shards_. +Each shard represents a logical slice of the overall collection and contains a subset of the documents. + +The number of shards determines the theoretical limit to the number of documents that can be stored. +It also dictates the amount of parallelization possible for an individual search request. + +=== Replicas + +A shard is a logical concept—a slice of your collection. +A _replica_ is the physical manifestation of that logical shard. +It is the actual running instance that holds and serves the documents belonging to that shard. + +A shard must have at least one replica to exist physically. +If you have one shard with one physical copy, you have one replica. +If you add redundancy by creating additional copies of that shard, you have multiple replicas—each is equally a replica, including the first one. + +IMPORTANT: There is no "original shard" separate from its replicas. +The replicas ARE how the shard exists. +This is why we say "a shard with 2 replicas" has 2 total physical copies, not an original plus 2 additional copies. + +All replicas of the same shard contain the same subset of documents and share the same configuration. + +The number of replicas determines the level of fault tolerance the cluster has in the event of a node failure. +It also dictates the theoretical limit on the number of concurrent search requests that can be processed under heavy load. + +=== Leaders and Followers + +Among the replicas for a given shard, one replica is designated as the _leader_. +The leader serves as the source-of-truth for its shard. +When document updates are made, they are first processed by the leader replica and then propagated to the other replicas (the exact mechanism varies by cluster mode). + +The replicas which are not leaders are called _followers_. + +=== Cores + +In Solr's implementation, each replica is represented as a _core_. +The term "core" is primarily an internal implementation detail—when you create a replica, Solr creates a core to represent it. +Multiple cores can be hosted on any one node. + +NOTE: The term "core" can be confusing because in everyday English it implies something central and singular, but in Solr it actually refers to one of potentially many replicas distributed across the cluster. +In most contexts, thinking of "core" as synonymous with "replica" will help clarify discussions about Solr's architecture. + +=== Collections and Indexes + +A _collection_ is the complete logical set of searchable documents that share a schema and configuration. +In SolrCloud mode (described below), a collection encompasses all the shards and their replicas. + +An _index_ refers to the physical data structures written to disk by Apache Lucene. +Each core (replica) maintains exactly one Lucene index on disk, containing the actual inverted indexes, stored fields, and other data structures that enable search. + +This creates a clear hierarchy from logical concepts to physical storage: + +[source,text] +---- +Collection (logical grouping of all searchable documents) + └─> Shard 1 (logical partition) + │ └─> Replica 1 / Core 1 (physical instance) + │ │ └─> Lucene Index (disk structures) + │ └─> Replica 2 / Core 2 (physical instance) + │ └─> Lucene Index (disk structures) + └─> Shard 2 (logical partition) + └─> Replica 1 / Core 3 (physical instance) + │ └─> Lucene Index (disk structures) + └─> Replica 2 / Core 4 (physical instance) + └─> Lucene Index (disk structures) +---- + +In this example, a collection is divided into 2 shards, each shard has 2 replicas for redundancy, and each replica maintains its own Lucene index on disk. + +== SolrCloud Mode + +SolrCloud mode (also called "SolrCloud") uses Apache ZooKeeper to provide the centralized cluster management that is its main feature. +ZooKeeper tracks each node of the cluster and the state of each core on each node. + +In this mode, configuration files are stored in ZooKeeper and not on the file system of each node. +When configuration changes are made, they must be uploaded to ZooKeeper, which in turn makes sure each node knows changes have been made. + +SolrCloud manages collections as first-class entities. +A collection represents the entire group of shards and replicas that together provide access to a corpus of documents. +Collections share the same configurations (schema, `solrconfig.xml`, etc.). +This centralization of cluster management means that operations can be performed on the entire collection at one time. + +When changes are made to configurations, a single command to reload the collection will automatically reload each individual core (replica) that is a member of the collection. + +Sharding is handled automatically, simply by telling Solr during collection creation how many shards you'd like the collection to have. +Document updates are then generally balanced between each shard automatically. Review Comment: ```suggestion Document updates are then generally balanced between each shard based on a hash. ``` ########## solr/solr-ref-guide/modules/getting-started/pages/cluster-types.adoc: ########## @@ -0,0 +1,158 @@ += Solr Cluster Types +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +A Solr cluster is a group of servers that each run one or more Solr _nodes_. + +There are two general modes of operating a cluster of Solr nodes. +One mode provides central coordination of the Solr nodes (<<SolrCloud Mode>>), while the other allows you to operate a cluster without this central coordination (<<User-Managed Mode>>). + +TIP: "User Managed" and "Single Node" are sometimes referred to as "Standalone", especially in source code. + +Both modes share general concepts, but ultimately differ in how those concepts are reflected in functionality and features. + +First let's cover a few general concepts and then outline the differences between the two modes. + +== Cluster Concepts + +=== Servers and Nodes + +A _server_ is the hardware or virtual machine that hosts Solr software. +A _node_ is an instance of a running Solr process that services search and indexing requests. +Large servers may run multiple Solr nodes, though typically one node per server is most common. + +=== Shards + +In both cluster modes, a logical collection of documents can be divided across nodes as _shards_. +Each shard represents a logical slice of the overall collection and contains a subset of the documents. + +The number of shards determines the theoretical limit to the number of documents that can be stored. +It also dictates the amount of parallelization possible for an individual search request. + +=== Replicas + +A shard is a logical concept—a slice of your collection. +A _replica_ is the physical manifestation of that logical shard. +It is the actual running instance that holds and serves the documents belonging to that shard. + +A shard must have at least one replica to exist physically. +If you have one shard with one physical copy, you have one replica. +If you add redundancy by creating additional copies of that shard, you have multiple replicas—each is equally a replica, including the first one. + +IMPORTANT: There is no "original shard" separate from its replicas. +The replicas ARE how the shard exists. +This is why we say "a shard with 2 replicas" has 2 total physical copies, not an original plus 2 additional copies. + +All replicas of the same shard contain the same subset of documents and share the same configuration. + +The number of replicas determines the level of fault tolerance the cluster has in the event of a node failure. +It also dictates the theoretical limit on the number of concurrent search requests that can be processed under heavy load. + +=== Leaders and Followers + +Among the replicas for a given shard, one replica is designated as the _leader_. +The leader serves as the source-of-truth for its shard. +When document updates are made, they are first processed by the leader replica and then propagated to the other replicas (the exact mechanism varies by cluster mode). + +The replicas which are not leaders are called _followers_. + +=== Cores + +In Solr's implementation, each replica is represented as a _core_. +The term "core" is primarily an internal implementation detail—when you create a replica, Solr creates a core to represent it. +Multiple cores can be hosted on any one node. + +NOTE: The term "core" can be confusing because in everyday English it implies something central and singular, but in Solr it actually refers to one of potentially many replicas distributed across the cluster. +In most contexts, thinking of "core" as synonymous with "replica" will help clarify discussions about Solr's architecture. + +=== Collections and Indexes + +A _collection_ is the complete logical set of searchable documents that share a schema and configuration. +In SolrCloud mode (described below), a collection encompasses all the shards and their replicas. + +An _index_ refers to the physical data structures written to disk by Apache Lucene. +Each core (replica) maintains exactly one Lucene index on disk, containing the actual inverted indexes, stored fields, and other data structures that enable search. + +This creates a clear hierarchy from logical concepts to physical storage: + +[source,text] +---- +Collection (logical grouping of all searchable documents) + └─> Shard 1 (logical partition) + │ └─> Replica 1 / Core 1 (physical instance) + │ │ └─> Lucene Index (disk structures) + │ └─> Replica 2 / Core 2 (physical instance) + │ └─> Lucene Index (disk structures) + └─> Shard 2 (logical partition) + └─> Replica 1 / Core 3 (physical instance) + │ └─> Lucene Index (disk structures) + └─> Replica 2 / Core 4 (physical instance) + └─> Lucene Index (disk structures) +---- + +In this example, a collection is divided into 2 shards, each shard has 2 replicas for redundancy, and each replica maintains its own Lucene index on disk. + +== SolrCloud Mode Review Comment: ```suggestion == SolrCloud Clusters ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
