SourabhBadhya commented on code in PR #5779: URL: https://github.com/apache/hive/pull/5779#discussion_r2071712317
########## serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerializer.java: ########## @@ -231,10 +232,18 @@ private Object serializePrimitive(TypeInfo typeInfo, PrimitiveObjectInspector fi case TIMESTAMP: Timestamp timestamp = ((TimestampObjectInspector) fieldOI).getPrimitiveJavaObject(structFieldData); + LogicalType logicalType = schema.getLogicalType(); + if (logicalType != null && logicalType.getName().equals(AvroSerDe.TIMESTAMP_TYPE_NAME_MICROS)) { Review Comment: Use `equalsIgnoreCase` ########## common/src/java/org/apache/hadoop/hive/common/type/Timestamp.java: ########## @@ -162,6 +162,11 @@ public long toEpochMilli() { return localDateTime.toInstant(ZoneOffset.UTC).toEpochMilli(); } + public long toEpochMicro() { + return localDateTime.toEpochSecond(ZoneOffset.UTC) * 1_000_000 Review Comment: Can we use similar semantics as `toEpochMilli()`? `return localDateTime.toInstant(ZoneOffset.UTC).toEpochMilli() * 1000 + localDateTime.getNano() / 1000;` ########## serde/src/test/org/apache/hadoop/hive/serde2/avro/TestTypeInfoToSchema.java: ########## @@ -261,7 +261,7 @@ public void createAvroDateSchema() { public void createAvroTimestampSchema() { final String specificSchema = "{" + "\"type\":\"long\"," + - "\"logicalType\":\"timestamp-millis\"}"; + "\"logicalType\":\"timestamp-micros\"}"; Review Comment: Its better to extend test classes one for each of these datatypes, rather than replace this by TIMESTAMP_TYPE_NAME_MILLIS. ########## ql/src/test/queries/clientpositive/avro_timestamp_micros.q: ########## @@ -0,0 +1,3 @@ +CREATE EXTERNAL TABLE hive_test(`dt` timestamp) STORED AS AVRO; Review Comment: This needs to have more test cases. Consider atleast referring to existing test case from millis and extend it for micros. We are handling a lot of cases, such as proleptic calendar, zone based conversions etc. ########## storage-api/src/java/org/apache/hadoop/hive/common/type/CalendarUtils.java: ########## @@ -180,6 +201,27 @@ public static long convertTimeToHybrid(long proleptic) { return hybrid; } + /** + * Convert epoch microseconds from the proleptic Gregorian calendar to the + * hybrid Julian/Gregorian. + * @param prolepticMicros Microseconds of epoch in the proleptic Gregorian + * @return Microseconds of epoch in the hybrid Julian/Gregorian + */ + public static long convertTimeToHybridMicros(long prolepticMicros) { + long hybridMicros = prolepticMicros; + long prolepticMillis = prolepticMicros / 1_000L; // Convert micros to millis + + if (prolepticMillis < SWITCHOVER_MILLIS) { + String dateStr = PROLEPTIC_TIME_FORMAT.get().format(new Date(prolepticMillis)); + try { + hybridMicros = HYBRID_TIME_FORMAT.get().parse(dateStr).getTime() * 1_000L; // Convert millis back to micros + } catch (ParseException e) { + throw new IllegalArgumentException("Can't parse " + dateStr, e); + } + } + return hybridMicros; + } Review Comment: Same as earlier, consider keeping it as micros due to loss in precision while changing to millis. ########## serde/src/java/org/apache/hadoop/hive/serde2/avro/SchemaToTypeInfo.java: ########## @@ -180,7 +180,8 @@ public static TypeInfo generateTypeInfo(Schema schema, } if (type == LONG && - AvroSerDe.TIMESTAMP_TYPE_NAME.equals(schema.getProp(AvroSerDe.AVRO_PROP_LOGICAL_TYPE))) { + (AvroSerDe.TIMESTAMP_TYPE_NAME_MILLIS.equals(schema.getProp(AvroSerDe.AVRO_PROP_LOGICAL_TYPE)) || + AvroSerDe.TIMESTAMP_TYPE_NAME_MICROS.equals(schema.getProp(AvroSerDe.AVRO_PROP_LOGICAL_TYPE)))) { Review Comment: Consider creating a function to handle all these timestamp cases. Currently its only 2 cases, but looking at the extension itself, in the future it can be more. ########## serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerializer.java: ########## @@ -231,10 +232,18 @@ private Object serializePrimitive(TypeInfo typeInfo, PrimitiveObjectInspector fi case TIMESTAMP: Timestamp timestamp = ((TimestampObjectInspector) fieldOI).getPrimitiveJavaObject(structFieldData); + LogicalType logicalType = schema.getLogicalType(); + if (logicalType != null && logicalType.getName().equals(AvroSerDe.TIMESTAMP_TYPE_NAME_MICROS)) { + long micros = defaultProleptic ? timestamp.toEpochMicro() : + CalendarUtils.convertTimeToHybridMicros(timestamp.toEpochMicro()); + timestamp = TimestampTZUtil.convertTimestampToZone( + Timestamp.ofEpochMicro(micros), TimeZone.getDefault().toZoneId(), ZoneOffset.UTC, legacyConversion); + return timestamp.toEpochMicro(); + } long millis = defaultProleptic ? timestamp.toEpochMilli() : - CalendarUtils.convertTimeToHybrid(timestamp.toEpochMilli()); + CalendarUtils.convertTimeToHybrid(timestamp.toEpochMilli()); timestamp = TimestampTZUtil.convertTimestampToZone( - Timestamp.ofEpochMilli(millis), TimeZone.getDefault().toZoneId(), ZoneOffset.UTC, legacyConversion); + Timestamp.ofEpochMilli(millis), TimeZone.getDefault().toZoneId(), ZoneOffset.UTC, legacyConversion); Review Comment: nit: Remove unnecessary whitespace changes to avoid rewriting git history ########## serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroObjectInspectorGenerator.java: ########## @@ -227,7 +227,7 @@ public class TestAvroObjectInspectorGenerator { " \"fields\" : [\n" + " {\"name\":\"timestampField\", " + " \"type\":\"" + AvroSerDe.AVRO_LONG_TYPE_NAME + "\", " + - " \"logicalType\":\"" + AvroSerDe.TIMESTAMP_TYPE_NAME + "\"}" + + " \"logicalType\":\"" + AvroSerDe.TIMESTAMP_TYPE_NAME_MILLIS + "\"}" + Review Comment: Its better to extend test classes one for each of these datatypes, rather than replace this by `TIMESTAMP_TYPE_NAME_MILLIS`. ########## storage-api/src/java/org/apache/hadoop/hive/common/type/CalendarUtils.java: ########## @@ -161,6 +161,27 @@ public static long convertTimeToProleptic(long hybrid) { return proleptic; } + /** + * Convert epoch microseconds from the hybrid Julian/Gregorian calendar to the + * proleptic Gregorian. + * @param hybridMicros Microseconds of epoch in the hybrid Julian/Gregorian + * @return Microseconds of epoch in the proleptic Gregorian + */ + public static long convertTimeToProlepticMicros(long hybridMicros) { + long prolepticMicros = hybridMicros; + long hybridMillis = hybridMicros / 1_000L; // Convert micros to millis Review Comment: This has the effect of removing the precision of the micros by a factor of 1000. Consider using micros itself. Can we create SWITCHOVER_MICROS using the same logic as SWITCHOVER_MILLIS for this? ########## serde/src/java/org/apache/hadoop/hive/serde2/avro/TypeInfoToSchema.java: ########## @@ -159,7 +159,7 @@ private Schema createAvroPrimitive(TypeInfo typeInfo) { case TIMESTAMP: schema = AvroSerdeUtils.getSchemaFor("{" + "\"type\":\"" + AvroSerDe.AVRO_LONG_TYPE_NAME + "\"," + - "\"logicalType\":\"" + AvroSerDe.TIMESTAMP_TYPE_NAME + "\"}"); + "\"logicalType\":\"" + AvroSerDe.TIMESTAMP_TYPE_NAME_MICROS + "\"}"); Review Comment: Why this case? Does this have the effect of making the type name as `timestamp-micros` as default? ########## serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroDeserializer.java: ########## @@ -388,11 +389,21 @@ private Object deserializePrimitive(Object datum, Schema fileSchema, Schema reco skipProlepticConversion = HiveConf.ConfVars.HIVE_AVRO_PROLEPTIC_GREGORIAN_DEFAULT.defaultBoolVal; } } - Timestamp timestamp = TimestampTZUtil.convertTimestampToZone( - Timestamp.ofEpochMilli((Long) datum), ZoneOffset.UTC, convertToTimeZone, legacyConversion); - if (!skipProlepticConversion) { + LogicalType logicalType = fileSchema.getLogicalType(); + Timestamp timestamp; + if (logicalType != null && logicalType.getName().equals(AvroSerDe.TIMESTAMP_TYPE_NAME_MICROS)) { Review Comment: Use `equalsIgnoreCase`. ########## storage-api/src/java/org/apache/hadoop/hive/common/type/CalendarUtils.java: ########## @@ -161,6 +161,27 @@ public static long convertTimeToProleptic(long hybrid) { return proleptic; } + /** + * Convert epoch microseconds from the hybrid Julian/Gregorian calendar to the + * proleptic Gregorian. + * @param hybridMicros Microseconds of epoch in the hybrid Julian/Gregorian + * @return Microseconds of epoch in the proleptic Gregorian + */ + public static long convertTimeToProlepticMicros(long hybridMicros) { + long prolepticMicros = hybridMicros; + long hybridMillis = hybridMicros / 1_000L; // Convert micros to millis + + if (hybridMillis < SWITCHOVER_MILLIS) { + String dateStr = HYBRID_TIME_FORMAT.get().format(new Date(hybridMillis)); + try { + prolepticMicros = PROLEPTIC_TIME_FORMAT.get().parse(dateStr).getTime() * 1_000L; // Convert millis back to micros Review Comment: Precision is lost, consider keeping it as micros itself. ########## serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroDeserializer.java: ########## @@ -388,11 +389,21 @@ private Object deserializePrimitive(Object datum, Schema fileSchema, Schema reco skipProlepticConversion = HiveConf.ConfVars.HIVE_AVRO_PROLEPTIC_GREGORIAN_DEFAULT.defaultBoolVal; } } - Timestamp timestamp = TimestampTZUtil.convertTimestampToZone( - Timestamp.ofEpochMilli((Long) datum), ZoneOffset.UTC, convertToTimeZone, legacyConversion); - if (!skipProlepticConversion) { + LogicalType logicalType = fileSchema.getLogicalType(); + Timestamp timestamp; + if (logicalType != null && logicalType.getName().equals(AvroSerDe.TIMESTAMP_TYPE_NAME_MICROS)) { + timestamp = Timestamp.ofEpochMicro((Long) datum); + } else { + timestamp = Timestamp.ofEpochMilli((Long) datum); + } + timestamp = TimestampTZUtil.convertTimestampToZone( + timestamp, ZoneOffset.UTC, convertToTimeZone, legacyConversion); + if (!skipProlepticConversion && logicalType != null && logicalType.getName().equals(AvroSerDe.TIMESTAMP_TYPE_NAME_MICROS)) { Review Comment: This can be simplified into a nested if block - if (!skipProlepticConversion) { // Add other if checks .... } Also use `equalsIgnoreCase` ########## serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroSerializer.java: ########## @@ -125,7 +125,7 @@ public void canSerializeDoubles() throws SerDeException, IOException { public void canSerializeTimestamps() throws SerDeException, IOException { singleFieldTest("timestamp1", Timestamp.valueOf("2011-01-01 00:00:00").toEpochMilli(), "\"" + AvroSerDe.AVRO_LONG_TYPE_NAME + "\"," + - "\"logicalType\":\"" + AvroSerDe.TIMESTAMP_TYPE_NAME + "\""); + "\"logicalType\":\"" + AvroSerDe.TIMESTAMP_TYPE_NAME_MILLIS + "\""); Review Comment: Its better to extend test classes one for each of these datatypes, rather than replace this by TIMESTAMP_TYPE_NAME_MILLIS. ########## common/src/java/org/apache/hadoop/hive/common/type/Timestamp.java: ########## @@ -162,6 +162,11 @@ public long toEpochMilli() { return localDateTime.toInstant(ZoneOffset.UTC).toEpochMilli(); } + public long toEpochMicro() { + return localDateTime.toEpochSecond(ZoneOffset.UTC) * 1_000_000 + + localDateTime.getNano() / 1000; + } + public long toEpochMilli(ZoneId id) { Review Comment: Isn't zone conversion also applicable for micros? ########## serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerializer.java: ########## @@ -231,10 +232,18 @@ private Object serializePrimitive(TypeInfo typeInfo, PrimitiveObjectInspector fi case TIMESTAMP: Timestamp timestamp = ((TimestampObjectInspector) fieldOI).getPrimitiveJavaObject(structFieldData); + LogicalType logicalType = schema.getLogicalType(); + if (logicalType != null && logicalType.getName().equals(AvroSerDe.TIMESTAMP_TYPE_NAME_MICROS)) { + long micros = defaultProleptic ? timestamp.toEpochMicro() : + CalendarUtils.convertTimeToHybridMicros(timestamp.toEpochMicro()); + timestamp = TimestampTZUtil.convertTimestampToZone( + Timestamp.ofEpochMicro(micros), TimeZone.getDefault().toZoneId(), ZoneOffset.UTC, legacyConversion); + return timestamp.toEpochMicro(); + } long millis = defaultProleptic ? timestamp.toEpochMilli() : - CalendarUtils.convertTimeToHybrid(timestamp.toEpochMilli()); + CalendarUtils.convertTimeToHybrid(timestamp.toEpochMilli()); Review Comment: nit: Remove unnecessary whitespace changes to avoid rewriting git history -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org