Simon Klakegg created AVRO-3611:
-----------------------------------

             Summary: org.apache.avro.util.RandomData generates invalid test 
data
                 Key: AVRO-3611
                 URL: https://issues.apache.org/jira/browse/AVRO-3611
             Project: Apache Avro
          Issue Type: Improvement
          Components: java
    Affects Versions: 1.11.1
            Reporter: Simon Klakegg
             Fix For: 1.11.2
         Attachments: image-2022-08-18-19-05-37-323.png

When RandomData.java generates data it does not check for Logical Types, which 
are described here: [Specification | Apache 
Avro|https://avro.apache.org/docs/1.11.1/specification/_print/]




For instance the following the generate method would return this for INT fields:
{code:java}
    case INT:      return random.nextInt(); {code}
 

However, an int field could be of logical type date:
!image-2022-08-18-19-05-37-323.png|width=1052,height=266!

 

Which in make cases could create an int that is out of range for logicalType 
Date, and thus break when creating records in for instance kafka.

My suggestion is to generated data that is valid for logicalTypes, here is an 
example I made for int and long:
{code:java}
case INT:
    switch (logicalTypeName) {
      case "date":
        // Random number of days between Unix Epoch start day (0) and end day 
(24855)
        int maxDaysInEpoch = (int) 
Duration.ofSeconds(Integer.MAX_VALUE).toDays();
        return ThreadLocalRandom.current().nextInt(0, maxDaysInEpoch);
      case "time-millis":
        // Random number of milliseconds between midnight 00:00:00.000 (0) and 
23:59:59:999 (86399999)
        int maxMillisecondsInDay = (int) Duration.ofDays(1).toMillis() - 1;
        return random.nextInt(0, maxMillisecondsInDay);
      default: return random.nextInt();
    }
case LONG:
  switch (logicalTypeName) {
    case "time-micros":
      // Random number of microseconds between midnight 00:00:00.000000 (0) and 
23:59:59:999999 (86399999999)
      long maxMicrosecondsInDay = (Duration.ofDays(1).toNanos() - 1) / 1000;
      return random.nextLong(0, maxMicrosecondsInDay);
    case "timestamp-millis":
      // Random milliseconds between Unix Epoch (0) start and end 
(2147483647000)
      long maxMillisecondsInEpoch = 
TimeUnit.SECONDS.toMillis(Integer.MAX_VALUE);
      return ThreadLocalRandom.current().nextLong(0, maxMillisecondsInEpoch);
    case "timestamp-micros":
      // Random microseconds between Unix Epoch (0) start and end 
(2147483647000000)
      long maxMicrosecondsInEpoch = 
TimeUnit.SECONDS.toMicros(Integer.MAX_VALUE);
      return ThreadLocalRandom.current().nextLong(0, maxMicrosecondsInEpoch);
    case "local-timestamp-millis":
      // Random number of milliseconds between Unix Epoch start (0) and 100 
years from now (now() + 100)
      ZonedDateTime hundredYearsFromNow = ZonedDateTime.now().plusYears(100);
      long hundredYearsEpochMillis = ChronoUnit.MILLIS.between(Instant.EPOCH, 
hundredYearsFromNow);
      return random.nextLong(0, hundredYearsEpochMillis);
    case "local-timestamp-micros":
      // Random number of microseconds between Unix Epoch start (0) and 100 
years from now (now() + 100)
      long hundredYearsEpochMicros = ChronoUnit.MICROS.between(Instant.EPOCH, 
hundredYearsFromNow);
      return random.nextLong(0, hundredYearsEpochMicros);
    default: return random.nextLong();
  } {code}






 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to