Github user mmiklavc commented on the issue:
https://github.com/apache/metron/pull/1140
> If I introduced a bug to the pcap topology code, which test would fail?
@merrimanr They would all fail because the writing to HDFS sequence files
is incorrect. Prior to this refactoring, only 1 test would fail if a bug was
introduced, and likely only the first test. This test is, and has been, a
2-phase broad e2e/acceptance test since the beginning. I didn't seek to change
the fundamental operation of the test in this refactoring, but rather the
organization of those units to make it easier to pinpoint specific failures. I
definitely see your point about the setup, though this is still a definite
improvement imo that moves us in the direction of being able to make additional
refactorings should we so choose.
To answer your question more broadly, I've dealt with similar questions
when testing DAO layers with databases. The approach I've used in the past that
yielded optimal results without duplicating almost verbatim the code under test
and test code itself was to use the code under test to read/write to the system
and verify the expected values. To your point, however, this test is really 2
phases.
1. `Raw pcap -> pcap topology processes it -> pcap as sequence files in
HDFS`
2. `pcap as sequence files in HDFS -> pcap query filters it -> written to
local FS or HDFS`
Each of those can be tested independently, similar to what I described
about DAO's. Even so, I think there's probably value in doing the full e2e as
it's currently being done since we also have some spot-check unit tests around
the individual components as well. We could spin up a test for the pcaptopology
distinct from the querying, but there's definitely more work there. We would
need
to:
1. write test code that correctly reads the sequence files from HDFS
2. choose additional test conditions for the topology validation (e.g.
validating specific pcap values)
3. create a sequence file (probably just pulled from the output of the
existing integration test after the topology creates the files in HDFS) for
input to the query test.
All that being said, could I convince you that the improvements made by
this PR are a good first step that moves us closer to additional improvements
should we choose to make them?
---