[ 
https://issues.apache.org/jira/browse/METRON-1829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16655552#comment-16655552
 ] 

ASF GitHub Bot commented on METRON-1829:
----------------------------------------

Github user nickwallen commented on a diff in the pull request:

    https://github.com/apache/metron/pull/1239#discussion_r226380891
  
    --- Diff: 
metron-platform/metron-writer/src/main/java/org/apache/metron/writer/BulkWriterComponent.java
 ---
    @@ -118,12 +118,15 @@ public void commit(BulkWriterResponse response) {
     
       public void error(String sensorType, Throwable e, Iterable<Tuple> 
tuples, MessageGetStrategy messageGetStrategy) {
         LOG.error(format("Failing %d tuple(s); sensorType=%s", 
Iterables.size(tuples), sensorType), e);
    -    MetronError error = new MetronError()
    -            .withSensorType(Collections.singleton(sensorType))
    -            .withErrorType(Constants.ErrorType.INDEXING_ERROR)
    -            .withThrowable(e);
    -    tuples.forEach(t -> error.addRawMessage(messageGetStrategy.get(t)));
    -    handleError(tuples, error);
    +    tuples.forEach(t -> {
    --- End diff --
    
    It might also be useful to discuss on this PR, if the other similar method 
`error(Throwable, Iterable<Tuple>)`  needs updated.  That method gets called if 
messages are unparsable.  And in that case we send only a single error, instead 
of one for each error'd message.  Is that what we want?
    
    It might be what we want, but at the very least it would be useful to 
comment in there as to why we treat it differently, if indeed it should be 
different.  Could be something like this.
    ```
      public void error(Throwable e, Iterable<Tuple> tuples) {
        LOG.error(format("Failing %d tuple(s) due to invalid message; 
sensorType unknown", Iterables.size(tuples)), e);
    
        // emit only a single error message, even though many failed because ...
        MetronError error = new MetronError()
                .withErrorType(Constants.ErrorType.INDEXING_ERROR)
                .withThrowable(e);
        collector.emit(Constants.ERROR_STREAM, new 
Values(error.getJSONObject()));
        tuples.forEach(t -> collector.ack(t));
    
        // there is only one error to report for all of the failed tuples
        collector.reportError(e);
      }
    ```


> Large Error Message Causes Slow Search Performance
> --------------------------------------------------
>
>                 Key: METRON-1829
>                 URL: https://issues.apache.org/jira/browse/METRON-1829
>             Project: Metron
>          Issue Type: Bug
>            Reporter: Ryan Merriman
>            Priority: Major
>
> Errors that occur during batch writes in the index topologies (batch and RA) 
> are written to Elasticsearch as a single, large error message, with a field 
> for each failed message. For example, if the batch size is 5000, a single 
> error message will be created with 5000 fields `raw_message_0`, 
> `raw_message_1`, .., `raw_message_4999`. With such large messages, searching 
> the error index in Elasticsearch is excessively slow.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to