lostluck commented on a change in pull request #11413: [BEAM-9746] check for 0 length copies from state URL: https://github.com/apache/beam/pull/11413#discussion_r408507507
########## File path: sdks/go/pkg/beam/core/runtime/harness/statemgr_test.go ########## @@ -258,6 +261,167 @@ func TestStateChannel(t *testing.T) { } } +// TestStateKeyReader validates ordinary Read cases +func TestStateKeyReader(t *testing.T) { + const readLen = 4 + tests := []struct { + name string + buflens []int // sizes of the buffers received on the state channel. + numReads int Review comment: Consider: What kind of change to the stateKeyReader implementation would change the expected number of reads, based on how stateKeyReader gets its data? There are two kinds of unit test. What you've described is black box testing, where the testing code has no knowledge of internal details of the implementation. This is valuable when there are accessible API calls to set the unknown internal state of a struct, and in doing so, there are checkable expectations of how other API calls will behave as a result. As a rule, this is preferable, as it's resilient to implementation changes, but with a broad enough API surface, it can be very very expensive since it's simply checking the same code paths over and over and over again on different tests. Black box testing by necessity ends up having alot of redundancy. We do have black box testing in general for Beam Go, since we use the direct runner to validate large portions of the exec package in a "pipeline" context. You can find them via any of the test files that have their package name to include _test. There's also white box testing, where the internal details are known to the test implementer. These tests are an example of the latter. There is no user side API to configure the internal state of stateKeyReader.Read(), and in general for the io.Reader interface, which is how users (AKA other parts of the beam framework), will be using stateKeyReader. Since we can't configure things with a user side API, we must manipulate some other internal state, to set the initial conditions, and then from those conditions, check that we understand the code under test. Further, white box testing is the only reasonable option for such a general API like Read(). It gets passed in a buffer, and it's expected to write up to len(buffer) bytes to it, and tell you what it did. It doesn't say anything about the bytes themselves. The implementations vary dramatically depending on the purpose, and only the implementation can check that. The API in this case doesn't dictate the tests, otherwise there could be a "io.ReaderTester" that we could instead of running these. So, while numReads in general is inscrutable to the user, it's not inscrutable to us, the implementers of this test case, and the code. We have our own expectations for the code In fact, numReads a deterministic number based on the lengths of the backing buffers vs the lengths of the reads. Since it's something we can check, and know based on the initial conditions, and our understanding of the code, it's fine. numReads is a second order expectation based on the initial conditions, but it's also one that's very easy to check. To be very blunt, numReads is checking that the code behaves the way we think it behaves. In this case, had we been testing that the "nil" data buffer case required 1 read to return EOF, we wouldn't have had the bug in question. Your underlying concern about a "change state" test is valid though. Tests that require a specific implementation in order to pass, can be a nuisance. But that doesn't apply so much to second order expectations like numReads. If the number of reads changes when the implementation changes, I certainly would like to know about it, because that could be an *unexpected* side effect of the change being made. Conversely, if a refactorer believes they can change the implementation to reduce the number of calls to Read the test can expect, they can simply adjust the number, and in Test Driven Development style, use that to validate that the changes they're making accomplish the intended goal. I'm on the side of I'd rather have tests fail if a dramatic change in behavior occurs, and the change author need to adjust them since then it verifies that they ran the tests, and that the tests are actually working. Having tests that pass after you make a change is a double edged sword. Your code is either still satisfying the old expectations, or it's breaking in ways you aren't testing yet. To answer the question I posed at the top of this comment. I can think of one change I'd make to the implementation (I mention it in the top post for this PR), but to make it, we'd need to validate every use of the io.Reader interface in the Go SDK, but it would only affect one or two of the given test cases for the numReads value, not all of them. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services