Re: [PR] SOLR-17492: Introduce recommendations of WAYS of running Solr from small to massive [solr]

via GitHub Sat, 30 Nov 2024 08:27:12 -0800


ardatezcan1 commented on PR #2783:
URL: https://github.com/apache/solr/pull/2783#issuecomment-2509025014


   Whether you're just getting started with Solr or looking to fine-tune an 
existing setup, these practical tips and real-world scenarios may help you get 
the most out of this powerful search platform.
   
   **Best Practices for Using Solr**
   
   **1.Run Solr as a Cluster for Better Performance**
   Solr works best when deployed as a cluster. Start with at least three nodes 
for fault tolerance and scalability, and scale horizontally as your needs grow.
   
   - **Sharding and Replication:** Break your data into shards for parallel 
processing and use replicas for redundancy. A good starting point is two 
replicas per shard, but adjust this based on your workload.
   
   - **Optimize Indexing:** Carefully plan your schema to ensure efficient 
indexing and querying. Use dynamic fields and copy fields where appropriate to 
keep things flexible without overloading your system.
   
   - **Caching for Speed:** Solr provides powerful caching options like query, 
document, and filter caches. Use these for frequently accessed data to speed up 
query times significantly.
   
   - **Tune the JVM:** Since Solr is Java-based, JVM tuning is crucial. Adjust 
heap size to balance memory usage and garbage collection. Monitor GC logs and 
experiment with policies like G1GC or CMS for optimal performance.
   
   **2. Always Use Solr in Cloud Mode**
   For a robust, scalable setup, Solr Cloud Mode is the way to go. This setup 
requires ZooKeeper, which manages cluster coordination, leader election, and 
configuration.
   
   - **ZooKeeper’s Role:** ZooKeeper ensures your Solr cluster runs smoothly by 
handling shard placement, failover, and configuration changes dynamically.
   
   - **Backups and Security:** 
   -Always back up your Solr and ZooKeeper data regularly. Use Solr's built-in 
backup tools or external snapshot mechanisms for safety.
   -Secure your cluster with SSL/TLS, and set up role-based access control, 
ideally with tools like Apache Ranger. If Ranger isn’t an option, manual 
permissions management works too.
        
   - **Monitoring is Essential:** Keeping an eye on your Solr cluster is 
crucial for ensuring smooth operations. A great place to start is the Solr Web 
UI, which provides a user-friendly interface to monitor metrics like query 
performance, index health, and cache usage. It's easy to use and perfect for 
quickly spotting any issues. For more advanced needs, you may integrate tools 
like Prometheus and Grafana for custom dashboards and alerting. However, I 
should mention that I don’t have direct experience with Prometheus or Grafana 
specifically when working with Solr.
   
   **Using Scenarios: Real-World Applications of Solr**
   **1. Managing Solr for a Large Dataset**
   I used open-source Solr as a search engine for a mobile app. Instead of 
interacting with Solr directly, I managed the setup via ZooKeeper APIs. Here’s 
what that looked like:
   
   - **Cluster Configuration:**
   The cluster handled over 100 TB of data spread across 11 physical machines, 
each running 16 Solr instances.
   - **Sharding and Replication:**
   Data was stored in shards, with each shard having two replicas to ensure 
fault tolerance and load balancing.
   - **Data Storage:**
   Data was stored directly on the local file system, which was a great fit for 
this use case.
   - **Management Approach:**
   Instead of accessing Solr directly, I managed the system via ZooKeeper APIs. 
This approach, even with an embedded ZooKeeper, worked efficiently under heavy 
load.
   
   **2.Using Solr with Cloudera and HDFS**
   Another scenario involved deploying Solr in a Cloudera ecosystem with HDFS 
for storage. Here’s what worked and what didn’t:
   - **Cluster Management:**
   ZooKeeper handled cluster coordination, while Ranger (and previously Sentry) 
managed permissions.
   - **Challenges:**
   Occasionally, node failures caused HDFS file locks, which were difficult to 
resolve without downtime. These required manual fixes and a lot of patience!
   
   If you’ve got questions or need help with something specific, just let me 
know. I’m happy to share more!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] SOLR-17492: Introduce recommendations of WAYS of running Solr from small to massive [solr]

Reply via email to