Ruiqi Dong created AVRO-4269:
--------------------------------

             Summary: TimestampNanosConversion.toLong(...) encodes pre-epoch 
instants with the wrong nanosecond offset
                 Key: AVRO-4269
                 URL: https://issues.apache.org/jira/browse/AVRO-4269
             Project: Apache Avro
          Issue Type: Bug
          Components: java
            Reporter: Ruiqi Dong


*Summary*
`TimestampNanosConversion.toLong(...)` has a special path for negative epoch 
seconds with positive nanoseconds. That path subtracts `1_000_000` instead of 
`1_000_000_000`. As a result, an instant such as `1969-12-31T23:59:59.500Z` is 
encoded as `499000000` instead of `-500000000`.
 
*Affected code*
File: `lang/java/avro/src/main/java/org/apache/avro/data/TimeConversions.java`
{code:java}
public static class TimestampNanosConversion extends Conversion<Instant> {
  ...
  @Override
  public Long toLong(Instant instant, Schema schema, LogicalType type) {
    long seconds = instant.getEpochSecond();
    int nanos = instant.getNano();

    if (seconds < 0 && nanos > 0) {
      long micros = Math.multiplyExact(seconds + 1, 1_000_000_000L);
      long adjustment = nanos - 1_000_000;

      return Math.addExact(micros, adjustment);
    } else {
      long micros = Math.multiplyExact(seconds, 1_000_000_000L);

      return Math.addExact(micros, nanos);
    }
  }
} {code}
*Reproducer* 
Add this test to 
`lang/java/avro/src/test/java/org/apache/avro/data/TestTimeConversions.java`
{code:java}
@Test
void timestampNanosConversionBeforeEpoch() {
  TimestampNanosConversion conversion = new TimestampNanosConversion();
  Instant beforeEpoch = Instant.ofEpochSecond(-1, 500_000_000);

  assertEquals(-500_000_000L,
      (long) conversion.toLong(beforeEpoch, TIMESTAMP_NANOS_SCHEMA, 
LogicalTypes.timestampNanos()));
  assertEquals(beforeEpoch,
      conversion.fromLong(-500_000_000L, TIMESTAMP_NANOS_SCHEMA, 
LogicalTypes.timestampNanos()));
} {code}
Also initialize:
{code:java}
TIMESTAMP_NANOS_SCHEMA = 
LogicalTypes.timestampNanos().addToSchema(Schema.create(Schema.Type.LONG)); 
{code}
Run:
{code:java}
MAVEN_SKIP_RC=true 
JAVA_HOME=/opt/homebrew/Cellar/openjdk@21/21.0.6/libexec/openjdk.jdk/Contents/Home
 \
PATH=/opt/homebrew/Cellar/openjdk@21/21.0.6/libexec/openjdk.jdk/Contents/Home/bin:/opt/homebrew/bin:/usr/bin:/bin:/usr/sbin:/sbin
 \
/opt/homebrew/bin/mvn -q -t toolchains-local.xml -pl lang/java/avro \
  
-Dtest=org.apache.avro.data.TestTimeConversions#timestampNanosConversionBeforeEpoch
 test{code}
Observed behavior:
The test fails
{code:java}
expected: <-500000000> but was: <499000000> {code}
Expected behavior:
`Instant.ofEpochSecond(-1, 500_000_000)` should encode to `-500_000_000` 
nanoseconds from the Unix epoch.


Avro logical type `timestamp-nanos` represents an instant as a long count of 
nanoseconds from the epoch. The current implementation corrupts pre-epoch 
instants with a fractional nanosecond component, which can reorder timestamps 
and break round-trip encoding. The fix direction is to subtract `1_000_000_000` 
in the negative branch, matching the nanosecond unit used by the rest of the 
method.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to