lostluck commented on PR #23285:
URL: https://github.com/apache/beam/pull/23285#issuecomment-1308082338

   My recommendation here is to not worry about a full on integration test for 
now. If you've run this on a portable runner (like Flink or Cloud Dataflow), 
and it works, that's sufficient E2E verification for now for example.
   
   I'd much prefer a robust in-memory unit test of the ProcessLogic though, and 
that we can do by refactoring the logic to be more unit testable, in separate 
stand alone functions. The pipeline construction logic is a bit hairy to test 
outside of running pipelines, or writing elaborate pre-set up (creating a 
spanner instance, and tearing it down, etc). Trying to connect to a test 
database is also tricky because typically it requires everything to be in 
memory, or to have something arbitrary distributed runners can connect to 
(which won't be true on Dataflow, for example).
   
   The "simple" way to test this is to migrate the DoFn code that calls the 
client into testable functions, and pass those functions the client. This 
allows unit testing the important logic of calling the spanner APIs, if not the 
beam specific logic of setting up the client.
   
   This is what we do to test the pubsubx "helper" logic for example: 
https://github.com/apache/beam/blob/3c5ea0dfd500c2f6b97eaaf4e39612c406afc9f5/sdks/go/pkg/beam/util/pubsubx/pubsub_test.go
 which is used to set up some of the streaming examples against pubsub.
   
   So, we can set some things up, like `query(ctx context.Context, client 
spanner.Client, query string, rt reflect.Type, emit func(beam.X))`
   
   Then you can write a test that sets up the the spannertest client with the 
data, and passes in a closure like `func(v beam.X) { values = append(values, v) 
}`, allowing us to check all the values, and then that will validate most of 
that logic. 
   
   And you can validate you're getting the expected types out too. And 
similarly for the writing variant.
   
   The spannertest has code you can borrow for setting up the client data 
properly: 
https://github.com/googleapis/google-cloud-go/blob/main/spanner/spannertest/integration_test.go#L127
   and 
https://github.com/googleapis/google-cloud-go/blob/main/spanner/spannertest/integration_test.go#L212
   
   I haven't got a great advice for this in general, as most effort for IOs is 
hard focused on Java, and outside of simple style guidelines, there's not much 
in the way of "This is the comprehensive way to write a testable IO" that I 
could translate from Java to Go for you. And it's not something I feel I should 
"wing".


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to