Github user mmiklavc commented on the issue:

    https://github.com/apache/metron/pull/1140
  
    > If I introduced a bug to the pcap topology code, which test would fail?
    
    @merrimanr They would all fail because the writing to HDFS sequence files 
is incorrect. Prior to this refactoring, only 1 test would fail if a bug was 
introduced, and likely only the first test. This test is, and has been, a 
2-phase broad e2e/acceptance test since the beginning. I didn't seek to change 
the fundamental operation of the test in this refactoring, but rather the 
organization of those units to make it easier to pinpoint specific failures. I 
definitely see your point about the setup, though this is still a definite 
improvement imo that moves us in the direction of being able to make additional 
refactorings should we so choose.
    
    To answer your question more broadly, I've dealt with similar questions 
when testing DAO layers with databases. The approach I've used in the past that 
yielded optimal results without duplicating almost verbatim the code under test 
and test code itself was to use the code under test to read/write to the system 
and verify the expected values. To your point, however, this test is really 2 
phases. 
    1. `Raw pcap -> pcap topology processes it -> pcap as sequence files in 
HDFS`
    2. `pcap as sequence files in HDFS -> pcap query filters it -> written to 
local  FS or HDFS`
    
    Each of those can be tested independently, similar to what I described 
about DAO's. Even so, I think there's probably value in doing the full e2e as 
it's currently being done since we also have some spot-check unit tests around 
the individual components as well. We could spin up a test for the pcaptopology 
distinct from the querying, but there's definitely more work there. We would 
need
    to:
    1. write test code that correctly reads the sequence files from HDFS
    2. choose additional test conditions for the topology validation (e.g. 
validating specific pcap values)
    3. create a sequence file (probably just pulled from the output of the 
existing integration test after the topology creates the files in HDFS) for 
input to the query test.
    
    All that being said, could I convince you that the improvements made by 
this PR are a good first step that moves us closer to additional improvements 
should we choose to make them?


---

Reply via email to