[ 
https://issues.apache.org/jira/browse/METRON-1829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16655581#comment-16655581
 ] 

ASF GitHub Bot commented on METRON-1829:
----------------------------------------

Github user nickwallen commented on a diff in the pull request:

    https://github.com/apache/metron/pull/1239#discussion_r226386542
  
    --- Diff: 
metron-platform/metron-writer/src/main/java/org/apache/metron/writer/BulkWriterComponent.java
 ---
    @@ -118,12 +118,15 @@ public void commit(BulkWriterResponse response) {
     
       public void error(String sensorType, Throwable e, Iterable<Tuple> 
tuples, MessageGetStrategy messageGetStrategy) {
         LOG.error(format("Failing %d tuple(s); sensorType=%s", 
Iterables.size(tuples), sensorType), e);
    -    MetronError error = new MetronError()
    -            .withSensorType(Collections.singleton(sensorType))
    -            .withErrorType(Constants.ErrorType.INDEXING_ERROR)
    -            .withThrowable(e);
    -    tuples.forEach(t -> error.addRawMessage(messageGetStrategy.get(t)));
    -    handleError(tuples, error);
    +    tuples.forEach(t -> {
    --- End diff --
    
    > You're right! My mistake. I changed it to pass in just the current tuple 
to handleError
    
    Your latest commit was my first thought of a fix too; it is much simpler.  
But I don't think it solves the second problem that I mentioned.  
    
    > There is also the nuisance that we report the error to the collector for 
each and every failed message, instead of just once for the batch. There is 
only one Throwable error to report, so we should just report it once.
    
    I think reporting the error once is both a usability and performance 
improvement.
    
    I had to unravel the code base a bit more in my suggested code snippet so 
that it only reports the error once. 


> Large Error Message Causes Slow Search Performance
> --------------------------------------------------
>
>                 Key: METRON-1829
>                 URL: https://issues.apache.org/jira/browse/METRON-1829
>             Project: Metron
>          Issue Type: Bug
>            Reporter: Ryan Merriman
>            Priority: Major
>
> Errors that occur during batch writes in the index topologies (batch and RA) 
> are written to Elasticsearch as a single, large error message, with a field 
> for each failed message. For example, if the batch size is 5000, a single 
> error message will be created with 5000 fields `raw_message_0`, 
> `raw_message_1`, .., `raw_message_4999`. With such large messages, searching 
> the error index in Elasticsearch is excessively slow.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to