Re: SolrCloud on PublicCloud

2020-08-03 Thread Shawn Heisey

On 8/3/2020 12:04 PM, Mathew Mathew wrote:

Have been looking for architectural guidance on correctly configuring SolrCloud 
on Public Cloud (eg Azure/AWS)
In particular the zookeeper based autoscaling seems to overlap with the auto 
scaling capabilities of cloud platforms.

I have the following questions.

   1.  Should the ZooKeeper ensable be put in a autoscaling group. This seems 
to be a no, since the SolrNodes need to register against a static list of 
Zookeeper ips.


Correct.  There are features in ZK 3.5 for dynamic server membership, 
but in general it is better to have a static list.  The client must be 
upgraded as well for that feature to work.  The ZK client was upgraded 
to a 3.5 version in Solr 8.2.0.  I don't think we have done any testing 
of the dynamic membership feature.


ZK is generally best set up with either 3 or 5 servers, depending on the 
level of redundancy desired, and left alone unless there's a problem. 
With 3 servers, the ensemble can survive the failure of 1 server.  With 
5, it can survive the failure of 2.  As far as I know, getting back to 
full redundancy is best handled as a manual process, even if running 
version 3.5.



   2.  Should the SolrNodes be put in a AutoScaling group? Or should we just 
launch/register SolrNodes using a lambda function/Azure function.


That really depends on what you're doing.  There is no "one size fits 
most" configuration.


I personally would avoid setting things up in a way that results in Solr 
nodes automatically being added or removed.  Adding a node will 
generally result in a LOT of data being copied, and that can impact 
performance in a major way, so adding nodes should be scheduled to 
minimize impact.  If it's automatic in response to high load, adding a 
node can make performance a lot worse before it gets better.  When a 
node disappears, manual action is required for SolrCloud to forget the node.



   3.  Should the SolrNodes be associated with local storage or should they be 
attached to shared storage volumes.


Lucene (which provides most of Solr's functionality) generally does not 
like to work with shared storage.  In addition to potential latency 
issues for storage connected via a network, Lucene works extremely hard 
to ensure that only one process can open an index.  Using shared storage 
will encourage attempts to share the index directory between multiple 
processes, which almost always fails to work.


Things work best with locally attached storage utilizing an extremely 
fast connection method (like SATA or SCSI), and a locally handled 
filesystem.  Lucene uses some pretty involved file locking mechanisms, 
which often do not work well on remote or shared filesystems.


---

We (the developers that build this software) generally have a very 
near-sighted view of things, not really caring about details like the 
hardware deployment.  That probably needs to change a little bit, 
particularly when it comes to documentation.


Thanks,
Shawn


SolrCloud on PublicCloud

2020-08-03 Thread Mathew Mathew
Have been looking for architectural guidance on correctly configuring SolrCloud 
on Public Cloud (eg Azure/AWS)
In particular the zookeeper based autoscaling seems to overlap with the auto 
scaling capabilities of cloud platforms.

I have the following questions.

  1.  Should the ZooKeeper ensable be put in a autoscaling group. This seems to 
be a no, since the SolrNodes need to register against a static list of 
Zookeeper ips.
  2.  Should the SolrNodes be put in a AutoScaling group? Or should we just 
launch/register SolrNodes using a lambda function/Azure function.
  3.  Should the SolrNodes be associated with local storage or should they be 
attached to shared storage volumes.

Seems like this would be a solved problem with established patterns, however 
could not find any documentation on it.
Appreciate insights from those, who have been here before.

Thanks,

Mathew
This email and the information contained herein is proprietary and confidential 
and subject to the Amdocs Email Terms of Service, which you may review at 
https://www.amdocs.com/about/email-terms-of-service