Github user StephanEwen commented on the issue:

    https://github.com/apache/flink/pull/3089
  
    Thanks you for the effort to increase testing performance.
    
     I think we have to look very closely here, because it may easily decrease 
the test coverage. I think there were subtle differences between the tests 
concerning reusability of result holder objects.
    The hash table classes are at the core of many DataSet operations and had 
subtle bugs before that we caught and fixed by massively expanding test 
coverage. We must absolutely preserve that.
    
    That being said, removing exact duplicates makes total sense - we simply 
need to double check that these are in fact exact duplicates and not fuzzy 
duplicates.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to