Andrew Pilloud created BEAM-10462:
-------------------------------------
Summary: org.apache.beam.sdk.transforms corrupt data when a value
is Double.NaN
Key: BEAM-10462
URL: https://issues.apache.org/jira/browse/BEAM-10462
Project: Beam
Issue Type: Bug
Components: sdk-java-core
Affects Versions: 2.22.0, 0.2.0-incubating
Reporter: Andrew Pilloud
Assignee: Andrew Pilloud
When there is a NaN value in the PCollection passed into Min or Max we get a
random value back due to the way the CombineFn works. Per the SQL standard, we
should always get NaN back. I'm going to add a special case get the right
answer.
Looks like we switched from using Double.compare to `>=` operator in
https://github.com/apache/beam/commit/21a5b44c3b541ba6c89df5649afe00412df73d10,
which introduced a data corruption bug.
A test case demonstrating this issue:
{code}
@Test
public void testDouble() {
Assert.assertFalse(Double.NaN >= 0.9);
Assert.assertFalse(0.9 >= Double.NaN);
Assert.assertFalse(Double.NaN >= Double.POSITIVE_INFINITY);
Assert.assertFalse(Double.POSITIVE_INFINITY >= Double.NaN);
Assert.assertTrue(Double.compare(Double.NaN, 0.9) >= 0);
Assert.assertFalse(Double.compare(0.9, Double.NaN) >= 0);
Assert.assertTrue(Double.compare(Double.NaN, Double.POSITIVE_INFINITY) >=
0);
Assert.assertFalse(Double.compare(Double.POSITIVE_INFINITY, Double.NaN) >=
0);
}
{code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)