Github user justinleet commented on the issue:
https://github.com/apache/metron/pull/970
@nickwallen For a bit of context, I tried primarily to keep things
essentially how they were in the ES versions (including the integration tests
and unit tests). I definitely appreciate you looking at this, the feedback is
definitely helpful, especially since like I said, this is a first cut, that
gets things implemented in a very similar manner to the current implementation.
Iterating on it is definitely worthwhile, but I think you're right, the
feedback is a little too abstract to work with as-is, so let's try to work out
a more concrete approach. Here's my initial thoughts, which are also mostly
too abstract, but we can try to get more detailed from here.
Re (1):
We could probably do something fairly similar to what we have the other
DAO's. Specifically, break things up into SolrMetaAlertUpdateDao +
ElasticsearchMetaAlertDao, SolrMetaAlertSearchDao + SolrMetaAlertSearchDao,
etc. Then have the SolrMetaAlertDao just wrap the appropriate calls as needed.
I think this will help make it more digestible (which is definitely a worthy
cause), but I don't think it helps make the implementations more independent.
I'd still expect them to be fairly large and share a lot of code in some
abstraction (whether that's abstract classes or not). The reason they'd share
code is that, unlike the SolrSearchDao vs ElasticsearchSearchDao is that the
metaalerts are (mostly) avoiding digging into Solr and ES internals which is
definitely not the case for the other DAOs.
Essentially there's a tradeoff between changes in the base classes breaking
one or both (but if that happens, I'd argue that our testing is incomplete and
we should be adding more tests like you mention) or having them vary
independently and having issues that are fixed on one side not fixed in the
other (which has it's own set of testing challenges). I'm curious what you
think is a good approach to breaking things up beyond that that avoids a lot of
code duplication.
For the implementation specific quirks, I completely agree, the abstraction
probably isn't as well defined as it could be. it might be possible to
refactor some things to make it more well defined, but at some point we're
other going to be duplicating code or patching abstraction leaks at the DAO
level. Because I don't think Solr and ES (and any other implementation we
choose) are fundamentally able to be papered over in a perfect abstraction. I
definitely agree it would be nice to make them less tied at the hip, the
problem is that a lot of this stuff is already broken up at the abstraction
points we have (particularly at Document level, rather than implementation
specific level). Right now, the implementation details are obscured from the
calling REST service, and its the responsibility of the MetaAlertDao to handle
the implementation details.
Re (2):
For the specific calculating metascores case, and probably some of the
other stuff, it only seems useful to break that sort of thing into it's own
class if we intend on making it pluggable. After all, that is a function that
is pretty tied into how metaalerts are laid out and I'm not really sure what
the benefit of splitting individual operations out like that is (unless we
potentially needed to make them available across composed DAOs). The other
thing is the actual calculation of the scores themselves are maintained by a
different class. It's only the actual management of the metaalert
representation handled by the Dao. I'm
To do more unit testing, there's definitely some easy targets in there.
I'd kept the unit tests that existed, but looking back at them, there's
definitely more things and helper functions that could be better tested even
without splitting things into more classes.
Is it worthwhile to do the refactoring into the *MetaAlertUpdateDao and
*UpdateSearchDao et al., along with adding some of the unit tests, and seeing
where we are at that point?
---