[GitHub] [beam] lostluck commented on a change in pull request #11413: [BEAM-9746] check for 0 length copies from state

GitBox Tue, 14 Apr 2020 17:13:10 -0700

lostluck commented on a change in pull request #11413: [BEAM-9746] check for 0 
length copies from state
URL: https://github.com/apache/beam/pull/11413#discussion_r408507507


 ##########
 File path: sdks/go/pkg/beam/core/runtime/harness/statemgr_test.go
 ##########
 @@ -258,6 +261,167 @@ func TestStateChannel(t *testing.T) {
        }
 }
 
+// TestStateKeyReader validates ordinary Read cases
+func TestStateKeyReader(t *testing.T) {
+       const readLen = 4
+       tests := []struct {
+               name     string
+               buflens  []int // sizes of the buffers received on the state 
channel.
+               numReads int
 
 Review comment:
   Consider: What kind of change to the stateKeyReader implementation would 
change the expected number of reads, based on how stateKeyReader gets its data?
   
   There are two kinds of unit test. What you've described is black box 
testing, where the testing code has no knowledge of internal details of the 
implementation. This is valuable when there are accessible API calls to set the 
unknown internal state of a struct, and in doing so, there are checkable 
expectations of how other API calls will behave as a result. As a rule, this is 
preferable, as it's resilient to implementation changes, but with a broad 
enough API surface, it can be very very expensive since it's simply checking 
the same code paths over and over and over again on different tests. Black box 
testing by necessity ends up having alot of redundancy. We do have black box 
testing in general for Beam Go, since we use the direct runner to validate 
large portions of the exec package in a "pipeline" context. You can find them 
via any of the test files that have their package name to include _test.
   
   There's also white box testing, where the internal details are known to the 
test implementer. These tests are an example of the latter. 
   
   There is no user side API to configure the internal state of 
stateKeyReader.Read(), and in general for the io.Reader interface, which is how 
users (AKA other parts of the beam framework), will be using stateKeyReader. 
   Since we can't configure things with a user side API, we must manipulate 
some other internal state, to set the initial conditions, and then from those 
conditions, check that we understand the code under test.
   
   Further, white box testing is the only reasonable option for such a general 
API like Read(). It gets passed in a buffer, and it's expected to write up to 
len(buffer) bytes to it, and tell you what it did. It doesn't say anything 
about the bytes themselves. The implementations vary dramatically depending on 
the purpose, and only the implementation can check that. The API in this case 
doesn't dictate the tests, otherwise there could be a "io.ReaderTester" that we 
could instead of running these.
   
   So, while numReads in general is inscrutable to the user, it's not 
inscrutable to us, the implementers of this test case, and the code. We have 
our own expectations for the code
   
   In fact, numReads a deterministic number based on the lengths of the backing 
buffers vs the lengths of the reads. Since it's something we can check, and 
know based on the initial conditions, and our understanding of the code, it's 
fine. numReads is a second order expectation based on the initial conditions, 
but it's also one that's very easy to check.
   
   To be very blunt, numReads is checking that the code behaves the way we 
think it behaves.
   
   In this case, had we been testing that the "nil" data buffer case required 1 
read to return EOF, we wouldn't have had the bug in question.
   
   Your underlying concern about a "change state" test is valid though. Tests 
that require a specific implementation in order to pass, can be a nuisance. But 
that doesn't apply so much to second order expectations like numReads.  If the 
number of reads changes when the implementation changes, I certainly would like 
to know about it, because that could be an *unexpected* side effect of the 
change being made. Conversely, if a refactorer believes they can change the 
implementation to reduce the number of calls to Read the test can expect, they 
can simply adjust the number, and in Test Driven Development style, use that to 
validate that the changes they're making accomplish the intended goal.
   
   I'm on the side of I'd rather have tests fail if a dramatic change in 
behavior occurs, and the change author need to adjust them since then it 
verifies that they ran the tests, and that the tests are actually working. 
Having tests that pass after you make a change is a double edged sword. Your 
code is either still satisfying the old expectations, or it's breaking in ways 
you aren't testing yet.
   
   To answer the question I posed at the top of this comment. I can think of 
one change I'd make to the implementation (I mention it in the top post for 
this PR), but to make it, we'd need to validate every use of the io.Reader 
interface in the Go SDK, but it would only affect one or two of the given test 
cases for the numReads value, not all of them.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [beam] lostluck commented on a change in pull request #11413: [BEAM-9746] check for 0 length copies from state

Reply via email to