ottobackwards commented on a change in pull request #1571: METRON-2313 - Added
additional documentation on SOLR performance, cha…
URL: https://github.com/apache/metron/pull/1571#discussion_r349934427
##########
File path: metron-platform/metron-solr/metron-solr-common/README.md
##########
@@ -72,6 +69,89 @@ via the global config. The following settings are possible
as part of the globa
* `httpBasicAuthPassword` : Basic auth password
* `solr.ssl.checkPeerName` : Check peer name
+### Performance: Server-side versus Client-side commits
+It is important to note that SOLR is not a ACID compliant database, in
particular there is no isolation between transactions.
+This has a major impact on performance if client-side commits are configured,
as a commit causes the entire collection to check-pointed
+and written to disk, pausing any other client writing data to the same
collection.
+
+In Metron, it is possible that dozens of storm spouts are writing data to the
same SOLR collection simultaneously.
+Each of these spouts triggering a client-side commit on the same SOLR
collection can have a catastrophic effect on performance.
+
+SOLR can manage this issue by removing the responsibility of committing data
from the clients, and letting the server
+trigger regular commits on a collection to flush data to disk. Because this
is server-side as opposed to client-side functionality, it
+ is controlled via the following parameters in each Collection's
`solrconfig.xml` configuration file:
+
+* autoCommit : Also called a 'hard' autocommit, this regularly synchronizes
ingested data to disk to guarantee data persistence.
+* autoSoftCommit : Allows data to become visible to searchers without
requiring an expensive commit to disk.
+
+`openSearcher=true` is an expensive (hard) autoCommit option that triggers
written data to be merged into on-disk indexes and become visible to searchers.
+It is the equivalent of a soft and hard commit performed at the same time. It
is rarely used for Near Real Time (NRT) search scenarios
+
+
+The standard mantra for configuring SOLR for Near Real-Time Search is:
+
+* autoCommit (with `openSearcher=false`) for persisting data,
+* autoSoftCommit for making data visible.
+
+These functions can (and nearly always do) have different time periods
configured for them.
+For example:
+* `autoCommit` (with `openSearcher=false` and `maxTime=30000` milliseconds) to
persist data,
+* `autoSoftCommit` (with `maxTime=120000` milliseconds) to make newly ingested
data visible.
+
+### Managing the risk of data loss
+Experienced admins at this stage would now be asking the question as to what
happens if SOLR crashes before a hard commit occurs to
+persist the data. While SOLR does write data to its transaction log as soon
as it is received, it does not fsync this log to disk until
+a hard commit is requested. Thus a hardware crash could technically risk the
data collected up to the interval that hard autoCommits
+are configured for.
+
+If the potential for data loss is unacceptable to the business then a common
architecture used to manage the risk of
+data loss is to have one or more replicas of each SOLR collection. When
ingesting data into a collection that has replicas, SOLR will
+not return a result to the client until the data has been passed to each
replica in a collection. Replicas immediately
+return acknowledgement as soon as the data is stored in local memory buffers.
The configured autoCommit/autoSoftCommit intervals
+later process and store the data on the replica in exactly the same way it is
processed and stored on the primary node.
+
+So if you are using collection replicas you are protected against individual
machine failures
+by the fact that your data is present in the main memory (immediately) and
disks (after an interval of time) of other replicas.
+This type of architecture is similar to how other distributed systems like
Kafka manages the performance/reliability trade-offs
+
+### Disabling client-side commits in Metron SOLR
+To disable client-side commits in Metron's Storm spouts, make sure
`solr.commitPerBatch = false` is set in Metron's
+global json configuration section. For details on how to change Metron's
global configuration, please refer to the documentation
+for Metron's `zk_load_config.sh` script.
+
+
+### Configuring server-side commits
Review comment:
Maybe we can explicitly state that these instructions are known to work and
tried with version 7.x? of SOLR.
Since we can't keep up with the latest?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services