[
https://issues.apache.org/jira/browse/METRON-1657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16545654#comment-16545654
]
ASF GitHub Bot commented on METRON-1657:
----------------------------------------
Github user ottobackwards commented on a diff in the pull request:
https://github.com/apache/metron/pull/1099#discussion_r202797418
--- Diff: metron-platform/metron-parsers/README.md ---
@@ -82,6 +82,12 @@ topology in kafka. Errors are collected with the
context of the error
(e.g. stacktrace) and original message causing the error and sent to an
`error` queue. Invalid messages as determined by global validation
functions are also treated as errors and sent to an `error` queue.
+
+Multiple sensors can be aggregated into a single Storm topology. When this
is done, there will be
+multiple Kafka spouts, but only a single parser bolt which will handle
delegating to the correct
--- End diff --
There is another, more likely use case where we have a transport wrapper on
another message, and 1 topic split into many parsers as well. How can we
handle that?
Specifically -> Syslog (Many Msg types) -> kafka -> bolt -> Split per
message
I expect to add the ability for syslog parsing later, so set that aside.
The issue is we *will* have more than one use case wrt topics.
I am not going to say this PR needs to address it, but I would want us to
understand our path forward and minimize the churn.
It would be best if we did not have to redo this work when accounting for
that.
> Parser aggregation in storm
> ---------------------------
>
> Key: METRON-1657
> URL: https://issues.apache.org/jira/browse/METRON-1657
> Project: Metron
> Issue Type: Bug
> Reporter: Justin Leet
> Assignee: Justin Leet
> Priority: Major
>
> Currently our parsing solution requires one storm topology per sensor. It has
> been complained that this may be wasteful of resources and that, rather than
> one storm topology per sensor, it would be advantageous to have multiple
> sensors in the same topology. The benefit to this is that it would require
> fewer storm slots.
> The issue with this is that whenever we've aggregated functionality like this
> before, we've run into issues appropriately being able to scale storm (e.g.
> batch vs random access indexing in the same topology). The main point in
> addressing this is to recommend that parsers with similar velocities and
> complexity are grouped together.
> Particularly for a first cut, leave the configuration mostly as-is, while
> allowing for comma separated lists of sensors in start_parser_topology.sh
> (e.g. bro,yaf creates a aggregated parser consisting of those two).
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)