tigerquoll commented on a change in pull request #1571: METRON-2313 - Added 
additional documentation on SOLR performance, cha…
URL: https://github.com/apache/metron/pull/1571#discussion_r351095753
 
 

 ##########
 File path: metron-platform/metron-solr/metron-solr-common/README.md
 ##########
 @@ -72,6 +69,89 @@ via the global config.  The following settings are possible 
as part of the globa
     * `httpBasicAuthPassword` : Basic auth password
     * `solr.ssl.checkPeerName` : Check peer name
 
+### Performance: Server-side versus Client-side commits
+It is important to note that SOLR is not a ACID compliant database, in 
particular there is no isolation between transactions.
+This has a major impact on performance if client-side commits are configured, 
as a commit causes the entire collection to check-pointed
+and written to disk, pausing any other client writing data to the same 
collection.  
+
+In Metron, it is possible that dozens of storm spouts are writing data to the 
same SOLR collection simultaneously. 
+Each of these spouts triggering a client-side commit on the same SOLR 
collection can have a catastrophic effect on performance.
+
+SOLR can manage this issue by removing the responsibility of committing data 
from the clients, and letting the server 
+trigger regular commits on a collection to flush data to disk.  Because this 
is server-side as opposed to client-side functionality, it 
+ is controlled via the following parameters in each Collection's 
`solrconfig.xml` configuration file:
+ 
+* autoCommit : Also called a 'hard' autocommit, this regularly synchronizes 
ingested data to disk to guarantee data persistence. 
+* autoSoftCommit : Allows data to become visible to searchers without 
requiring an expensive commit to disk.
+
+`openSearcher=true` is an expensive (hard) autoCommit option that triggers 
written data to be merged into on-disk indexes and become visible to searchers.
+It is the equivalent of a soft and hard commit performed at the same time. It 
is rarely used for Near Real Time (NRT) search scenarios
+
+
+The standard mantra for configuring SOLR for Near Real-Time Search is:
+
+* autoCommit (with `openSearcher=false`) for persisting data,
+* autoSoftCommit for making data visible.
+
+These functions can (and nearly always do) have different time periods 
configured for them.
+For example: 
+* `autoCommit` (with `openSearcher=false` and `maxTime=30000` milliseconds) to 
persist data,
+* `autoSoftCommit` (with `maxTime=120000` milliseconds) to make newly ingested 
data visible.
+
+### Managing the risk of data loss
+Experienced admins at this stage would now be asking the question as to what 
happens if SOLR crashes before a hard commit occurs to 
+persist the data.  While SOLR does write data to its transaction log as soon 
as it is received, it does not fsync this log to disk until 
+a hard commit is requested. Thus a hardware crash could technically risk the 
data collected up to the interval that hard autoCommits 
+are configured for.
+
+If the potential for data loss is unacceptable to the business then a common 
architecture used to manage the risk of
+data loss is to have one or more replicas of each SOLR collection. When 
ingesting data into a collection that has replicas, SOLR will 
+not return a result to the client until the data has been passed to each 
replica in a collection. Replicas immediately 
+return acknowledgement as soon as the data is stored in local memory buffers. 
The configured autoCommit/autoSoftCommit intervals 
+later process and store the data on the replica in exactly the same way it is 
processed and stored on the primary node.
+
+So if you are using collection replicas you are protected against individual 
machine failures 
+by the fact that your data is present in the main memory (immediately) and 
disks (after an interval of time) of other replicas. 
+This type of architecture is similar to how other distributed systems like 
Kafka manages the performance/reliability trade-offs 
+
+### Disabling client-side commits in Metron SOLR
+To disable client-side commits in Metron's Storm spouts, make sure 
`solr.commitPerBatch = false` is set in Metron's
+global json configuration section.  For details on how to change Metron's 
global configuration, please refer to the documentation
+for Metron's `zk_load_config.sh` script.
+
+
+### Configuring server-side commits
+1. Make sure that client-side SOLR commits are disabled in Metron
+
+1. You will need to change each collection's `solrconfig.xml` as described in 
the next step. Use either:
+
+    1. The destructive option: update the collection template in Metron's 
schema directory (`$METRON_HOME/config/schema`) and delete and re-create the 
schema via Metron's 
+delete-collection/create-collection [scripts](#Collections), or
+
 
 Review comment:
   https://issues.apache.org/jira/browse/METRON-2329 created to record follow 
up actions

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to