TheNeuralBit commented on pull request #15909:
URL: https://github.com/apache/beam/pull/15909#issuecomment-962291253


   The pandas/_libs/testing.pyx errors look like real errors in 
[`test_dataframe_agg_method`](https://github.com/apache/beam/blob/a3bb58dbd4fc6a59f93f69f9ab1980f8232b6e82/sdks/python/apache_beam/dataframe/frames_test.py#L1490):
   ```
   >   ???
   E   AssertionError: Series are different
   E   
   E   Series values are different (50.0 %)
   E   [index]: [A, B]
   E   [left]:  [-1.1999999999999993, -1.699511634587763]
   E   [right]: [-1.200000000000001, -0.40130739795918835]
   ```
   They just happen to come from `pd.testing.assert_frames_equal`, which we use 
to verify if DataFrame results are equivalent:  
https://github.com/apache/beam/blob/a3bb58dbd4fc6a59f93f69f9ab1980f8232b6e82/sdks/python/apache_beam/dataframe/frames_test.py#L175-L176
   
   I also see a couple of failures for `test_series_cov_corr` indicating it may 
be a little flaky, like [this 
one](https://ci-beam.apache.org/job/beam_PreCommit_Python_Commit/20405/testReport/junit/apache_beam.dataframe.frames_test/DeferredFrameTest/test_series_cov_corr_8/):
   
   ```
   apache_beam/dataframe/frames_test.py:191: in _run_test
       self.assertTrue(
   E   AssertionError: False is not true : Expected:
   E   
   E   -1.2
   E   
   E   Actual:
   E   
   E   -1.1999545602598247
   ```
   
   That's off by just 5e-5, but I guess it's enough for np.isclose to consider 
it different. If we can rule out an actual cause for this difference, we may 
want to plumb through an option for increasing the tolerance, like we discussed 
for skew. But it seems like something else may be going on here.
   
   The error in `test_dataframe_agg_method` does look like a hard failure, if 
you look 
[here](https://ci-beam.apache.org/job/beam_PreCommit_Python_Commit/20405/testReport/junit/apache_beam.dataframe.frames_test/AggregationTest/)
 you can see it failed in every run: 
   
![image](https://user-images.githubusercontent.com/675055/140591167-ef72f201-b323-4c4b-86e8-9750867a4fc8.png)
   
   and it's consistently producing -0.4 rather than -1.7 for column B. I'd 
suggest looking closer at the column B case from that test: 
https://github.com/apache/beam/blob/a3bb58dbd4fc6a59f93f69f9ab1980f8232b6e82/sdks/python/apache_beam/dataframe/frames_test.py#L1491
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to