vvcephei commented on a change in pull request #8541:
URL: https://github.com/apache/kafka/pull/8541#discussion_r414208673



##########
File path: 
streams/src/test/java/org/apache/kafka/streams/processor/internals/assignment/HighAvailabilityTaskAssignorTest.java
##########
@@ -41,132 +54,107 @@
 import static org.easymock.EasyMock.replay;
 import static org.hamcrest.CoreMatchers.equalTo;
 import static org.hamcrest.MatcherAssert.assertThat;
-import static org.junit.Assert.assertFalse;
-import static org.junit.Assert.assertTrue;
-
-import java.util.HashMap;
-import java.util.HashSet;
-import java.util.Map;
-import java.util.Set;
-import java.util.UUID;
-import org.apache.kafka.streams.processor.TaskId;
-import 
org.apache.kafka.streams.processor.internals.assignment.AssignorConfiguration.AssignmentConfigs;
-import org.easymock.EasyMock;
-import org.junit.Test;
+import static org.hamcrest.Matchers.empty;
+import static org.hamcrest.Matchers.is;
+import static org.hamcrest.Matchers.not;
 
 public class HighAvailabilityTaskAssignorTest {

Review comment:
       I made a bunch of changes to this test, because it was pretty brittle 
with respect to changes in the HighAvailabilityTaskAssignor. For context, this 
is the second time I've touched the assignment code since we introduced the 
HATA, and it's the second time I've had to deal with irrelevant test failures 
in this class.
   
   First, I replaced the ClientState mocks with "real" ClientStates, 
constructed to represent the desired scenario for each test. Mocks are really 
more appropriate for isolating a component from _external_ components (like 
mocking a remote service). Mocking data types leads to verifying that a 
specific set of queries happens against the data type, which is likely to break 
any time the logic under test changes in any way. Another problem with 
data-type mocks is that they can violate the invariants of the data type 
itself. For example, you can mock a list that both `isEmpty` and contains 
items. In our case, we threw NPEs in the assignor that could never happen in 
production when the mocked assigned/standby tasks didn't agree with the 
assigned tasks or the stateful assigned tasks weren't mocked to agree with the 
lags. Now, we just construct a ClientState for each client, representing the 
desired scenario and make assertions on the resulting assignment.
   
   Second, the tests as written rely heavily on shared mutable fields inserted 
into shared mutable collections to build the assignor. This can be a good way 
to minimize the text inside the test method, which lets readers focus on the 
proper logic of the test itself. However, it makes it harder to understand the 
full context of a test, and it also raises the possibility of tests polluting 
each others' environments. Since in this particular case, localizing all the 
setup code is about as compact as factoring it out, I went ahead and minimized 
the shared fields, and eliminated the mutability, the tests are self-contained.
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to