merrimanr commented on a change in pull request #1330: METRON-1968: Messages 
are lost when a parser produces multiple messages and batch size is greater 
than 1
URL: https://github.com/apache/metron/pull/1330#discussion_r259100742
 
 

 ##########
 File path: 
metron-platform/metron-common/src/main/java/org/apache/metron/common/writer/BulkMessageWriter.java
 ##########
 @@ -32,18 +29,16 @@
   void init(Map stormConf, TopologyContext topologyContext, 
WriterConfiguration config) throws Exception;
 
   /**
-  * Writes the messages to a particular output (e.g. Elasticsearch). 
Exceptions trigger failure of the entire batch.
-  * @param sensorType The type of sensor being generating the messages
+  * Writes the messages to a particular output (e.g. Elasticsearch).  A 
response is returned with successful and failed message ids.
+  * @param sensorType The type of sensor generating the messages
   * @param configurations Configurations that should be passed to the writer 
(e.g. index and
-  * @param tuples The Tuples that produced the message to be written
-  * @param messages  The message to be written
+  * @param messages  A list of messages to be written.  Message ids are used 
in the response to report successes/failures.
   * @return A response containing successes and failures within the batch.
   * @throws Exception If an unrecoverable error is made, an Exception is 
thrown which should be treated as a full-batch failure (e.g. target system is 
down).
   */
   BulkWriterResponse write(String sensorType
             , WriterConfiguration configurations
-            , Iterable<Tuple> tuples
-            , List<MESSAGE_T> messages
+            , List<BulkWriterMessage<MESSAGE_T>> messages
 
 Review comment:
   I experimented with this and it's not as straightforward as it seems.  
Throughout all these classes it is assumed that messages are batched per 
sensor.  If we move sensorType into each message than each class that consumes 
a `List<BulkMessage>` now has to handle it differently.
   
   For example the `ElasticsearchWriter` class looks up an index for the given 
sensor type and writes all messages to that index.  With this change we would 
now need to lookup an index for each sensor type.  Another example is 
`BatchTimeoutPolicy`.  The list of messages isn't even used because it relies 
solely on the timeout configured for each sensor type.  Do we get the sensor 
type from the list of messages now?  Do we just use the sensor type of the 
first message or do we need to ensure the sensor type matches for all messages 
in the list?
   
   I feel like this is a significant change and wouldn't provide much benefit.  
If we do want to simplify these interfaces I would prefer to move the sensor 
type and list of messages into a wrapper object (`BulkWriterRequest` or 
something).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to