[
https://issues.apache.org/jira/browse/METRON-1829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16655552#comment-16655552
]
ASF GitHub Bot commented on METRON-1829:
----------------------------------------
Github user nickwallen commented on a diff in the pull request:
https://github.com/apache/metron/pull/1239#discussion_r226380891
--- Diff:
metron-platform/metron-writer/src/main/java/org/apache/metron/writer/BulkWriterComponent.java
---
@@ -118,12 +118,15 @@ public void commit(BulkWriterResponse response) {
public void error(String sensorType, Throwable e, Iterable<Tuple>
tuples, MessageGetStrategy messageGetStrategy) {
LOG.error(format("Failing %d tuple(s); sensorType=%s",
Iterables.size(tuples), sensorType), e);
- MetronError error = new MetronError()
- .withSensorType(Collections.singleton(sensorType))
- .withErrorType(Constants.ErrorType.INDEXING_ERROR)
- .withThrowable(e);
- tuples.forEach(t -> error.addRawMessage(messageGetStrategy.get(t)));
- handleError(tuples, error);
+ tuples.forEach(t -> {
--- End diff --
It might also be useful to discuss on this PR, if the other similar method
`error(Throwable, Iterable<Tuple>)` needs updated. That method gets called if
messages are unparsable. And in that case we send only a single error, instead
of one for each error'd message. Is that what we want?
It might be what we want, but at the very least it would be useful to
comment in there as to why we treat it differently, if indeed it should be
different. Could be something like this.
```
public void error(Throwable e, Iterable<Tuple> tuples) {
LOG.error(format("Failing %d tuple(s) due to invalid message;
sensorType unknown", Iterables.size(tuples)), e);
// emit only a single error message, even though many failed because ...
MetronError error = new MetronError()
.withErrorType(Constants.ErrorType.INDEXING_ERROR)
.withThrowable(e);
collector.emit(Constants.ERROR_STREAM, new
Values(error.getJSONObject()));
tuples.forEach(t -> collector.ack(t));
// there is only one error to report for all of the failed tuples
collector.reportError(e);
}
```
> Large Error Message Causes Slow Search Performance
> --------------------------------------------------
>
> Key: METRON-1829
> URL: https://issues.apache.org/jira/browse/METRON-1829
> Project: Metron
> Issue Type: Bug
> Reporter: Ryan Merriman
> Priority: Major
>
> Errors that occur during batch writes in the index topologies (batch and RA)
> are written to Elasticsearch as a single, large error message, with a field
> for each failed message. For example, if the batch size is 5000, a single
> error message will be created with 5000 fields `raw_message_0`,
> `raw_message_1`, .., `raw_message_4999`. With such large messages, searching
> the error index in Elasticsearch is excessively slow.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)