Justin Leet created METRON-1912:
-----------------------------------

             Summary: Allow for indexing batches to be handled based on size
                 Key: METRON-1912
                 URL: https://issues.apache.org/jira/browse/METRON-1912
             Project: Metron
          Issue Type: Improvement
            Reporter: Justin Leet


In the indexing topology, batching of output is handled on a per sensor basis. 
E.g. bro and snort will each be batched independently and shipped to ES as 
either the batch reaches the per-sensor configured value or per-sensor 
configured timeout.

These batches are on a numerical basis, not a sizing basis. This means each 
individual sensor must be tuned independently, despite it tying into overall 
performance. Tuning batches in Elasticsearch, per the documentation, is heavily 
dependent on the data size of the batch rather than the number of items in a 
batch. Too small bulks result in too many requests and have potential 
performance bottlenecks, but too large bulks ("beyond a couple tens of 
megabytes") can cause also ES degradation.

Moving to data size batching can be broken up into two variants, managing this 
per sensor or moving everything to a single batch that sends for all sensors as 
needed.
 * If we manage per sensor, this might allow us to provide more reasonable per 
sensor defaults that avoid simple copy-pasted and cause misbehavior. Batch 
sizes are more likely to be relatively correct overall, although introduction 
of new sensors may still cause problems as more batches are sent. However, 
misbehavior is still very possible in the same manner as currently exists. 
Additional tuning as new sensors are onboarded is a potential cause for concern 
here (as it is in the current setup).
 * If we manage a pool for all sensors, this could substantially smooth out 
problems. Because all batches would be largely the same data size and 
configured at a single point, the opportunities for a sensor to misbehave are 
minimized (although a single sensor could send outlier messages, e.g. a 100 MB 
message, but these messages would still currently be problematic by causing 
enormously sized batches). Configuration likely moves to the global config, and 
the existing batching is refactored to avoid breaking backward compatibility. 
This approach may also be mirrored by other batching (e.g. to Kafka) to ensure 
a consistent experience. Tuning indexing should also be easier, as its more 
dependent on how much we're pushing to Elasticsearch and the particular 
cluster, rather than tuning each sensor.

I'm in favor of the second option, but any implementation likely requires a 
discussion on the dev list.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to