TheNeuralBit commented on pull request #15909: URL: https://github.com/apache/beam/pull/15909#issuecomment-962291253
The pandas/_libs/testing.pyx errors look like real errors in [`test_dataframe_agg_method`](https://github.com/apache/beam/blob/a3bb58dbd4fc6a59f93f69f9ab1980f8232b6e82/sdks/python/apache_beam/dataframe/frames_test.py#L1490): ``` > ??? E AssertionError: Series are different E E Series values are different (50.0 %) E [index]: [A, B] E [left]: [-1.1999999999999993, -1.699511634587763] E [right]: [-1.200000000000001, -0.40130739795918835] ``` They just happen to come from `pd.testing.assert_frames_equal`, which we use to verify if DataFrame results are equivalent: https://github.com/apache/beam/blob/a3bb58dbd4fc6a59f93f69f9ab1980f8232b6e82/sdks/python/apache_beam/dataframe/frames_test.py#L175-L176 I also see a couple of failures for `test_series_cov_corr` indicating it may be a little flaky, like [this one](https://ci-beam.apache.org/job/beam_PreCommit_Python_Commit/20405/testReport/junit/apache_beam.dataframe.frames_test/DeferredFrameTest/test_series_cov_corr_8/): ``` apache_beam/dataframe/frames_test.py:191: in _run_test self.assertTrue( E AssertionError: False is not true : Expected: E E -1.2 E E Actual: E E -1.1999545602598247 ``` That's off by just 5e-5, but I guess it's enough for np.isclose to consider it different. If we can rule out an actual cause for this difference, we may want to plumb through an option for increasing the tolerance, like we discussed for skew. But it seems like something else may be going on here. The error in `test_dataframe_agg_method` does look like a hard failure, if you look [here](https://ci-beam.apache.org/job/beam_PreCommit_Python_Commit/20405/testReport/junit/apache_beam.dataframe.frames_test/AggregationTest/) you can see it failed in every run:  and it's consistently producing -0.4 rather than -1.7 for column B. I'd suggest looking closer at the column B case from that test: https://github.com/apache/beam/blob/a3bb58dbd4fc6a59f93f69f9ab1980f8232b6e82/sdks/python/apache_beam/dataframe/frames_test.py#L1491 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
