Timothy Potter created SOLR-6743:
------------------------------------
Summary: Support deploying SolrCloud on YARN
Key: SOLR-6743
URL: https://issues.apache.org/jira/browse/SOLR-6743
Project: Solr
Issue Type: New Feature
Components: Hadoop Integration, SolrCloud
Reporter: Timothy Potter
We're seeing Solr running with Hadoop more and more and YARN allows us to
deploy and manage distributed applications across a cluster of machines. This
feature will provide support for deploying SolrCloud in YARN. Currently, the
code is implemented in an open-source project hosted on Lucidworks github, see:
https://github.com/LucidWorks/yarn-proto
We'd like to submit this to the Apache Solr project as a contrib so it is
easier to run Solr on YARN right out-of-the-box. There are a few hurdles to get
over though:
1) Overall approach: There are various options for supporting YARN, such as
Apache Slider, but I opted to just use the YARN client API directly which
simply invokes the bin/solr start script under the covers. The YARN specific
code is quite simple and most of the code is just handling command line
options/parsing. I'm curious what others think about having a simple native
solution that ships with Solr (similar to the HdfsDirectoryFactory) vs.
something more heavy-weight that requires 3rd party tools to be involved.
2) Unit testing - Solr on YARN relies on putting a full Solr bundle into HDFS
(which you can see how that might work in the SolrYarnTestIT test case). This
obviously has problems in the Solr build as there is no bundle of Solr
available during unit testing. I'm thinking about having a mock bundle that
simulates starting Solr but that limits what we can verify on the cluster once
it's up.
3) Shutdown - In order to support an orderly shutdown of Solr when the
application is stopped by the ResourceManager, we need a shutdown handler in
Jetty/Solr that allows a remote application to request shutdown. The built-in
Jetty shutdown handler requires the stop request to come from localhost. To
work-around this, I've introduced a custom ShutdownHandler that can be
configured using System properties at startup to allow a remote host to request
shutdown. When YARN starts Solr nodes, I register the address of the SolrMaster
node with a secret key that will allow the SolrMaster to shutdown Solr
gracefully. This seems secure since only the SolrMaster can request shutdown
using the correct key. Other ideas on how to handle graceful shutdown?
4) Additional features: The current implementation is useful for
starting/stopping SolrCloud nodes in YARN. My thinking is that you'll provision
the cluster using YARN and then just interact with Solr directly using Solr's
API , so the YARN layer is quite thin. Other features needed?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]