Andrew Pilloud created BEAM-10462:
-------------------------------------

             Summary: org.apache.beam.sdk.transforms corrupt data when a value 
is Double.NaN
                 Key: BEAM-10462
                 URL: https://issues.apache.org/jira/browse/BEAM-10462
             Project: Beam
          Issue Type: Bug
          Components: sdk-java-core
    Affects Versions: 2.22.0, 0.2.0-incubating
            Reporter: Andrew Pilloud
            Assignee: Andrew Pilloud


When there is a NaN value in the PCollection passed into Min or Max we get a 
random value back due to the way the CombineFn works. Per the SQL standard, we 
should always get NaN back. I'm going to add a special case get the right 
answer.

Looks like we switched from using Double.compare to `>=` operator in 
https://github.com/apache/beam/commit/21a5b44c3b541ba6c89df5649afe00412df73d10, 
which introduced a data corruption bug.

A test case demonstrating this issue:
{code}
  @Test
  public void testDouble() {
    Assert.assertFalse(Double.NaN >= 0.9);
    Assert.assertFalse(0.9 >= Double.NaN);
    Assert.assertFalse(Double.NaN >= Double.POSITIVE_INFINITY);
    Assert.assertFalse(Double.POSITIVE_INFINITY >= Double.NaN);
    Assert.assertTrue(Double.compare(Double.NaN, 0.9) >= 0);
    Assert.assertFalse(Double.compare(0.9, Double.NaN) >= 0);
    Assert.assertTrue(Double.compare(Double.NaN, Double.POSITIVE_INFINITY) >= 
0);
    Assert.assertFalse(Double.compare(Double.POSITIVE_INFINITY, Double.NaN) >= 
0);
  }
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to