SourabhBadhya commented on code in PR #5779:
URL: https://github.com/apache/hive/pull/5779#discussion_r2071712317
##########
serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerializer.java:
##########
@@ -231,10 +232,18 @@ private Object serializePrimitive(TypeInfo typeInfo,
PrimitiveObjectInspector fi
case TIMESTAMP:
Timestamp timestamp =
((TimestampObjectInspector)
fieldOI).getPrimitiveJavaObject(structFieldData);
+ LogicalType logicalType = schema.getLogicalType();
+ if (logicalType != null &&
logicalType.getName().equals(AvroSerDe.TIMESTAMP_TYPE_NAME_MICROS)) {
Review Comment:
Use `equalsIgnoreCase`
##########
common/src/java/org/apache/hadoop/hive/common/type/Timestamp.java:
##########
@@ -162,6 +162,11 @@ public long toEpochMilli() {
return localDateTime.toInstant(ZoneOffset.UTC).toEpochMilli();
}
+ public long toEpochMicro() {
+ return localDateTime.toEpochSecond(ZoneOffset.UTC) * 1_000_000
Review Comment:
Can we use similar semantics as `toEpochMilli()`?
`return localDateTime.toInstant(ZoneOffset.UTC).toEpochMilli() * 1000 +
localDateTime.getNano() / 1000;`
##########
serde/src/test/org/apache/hadoop/hive/serde2/avro/TestTypeInfoToSchema.java:
##########
@@ -261,7 +261,7 @@ public void createAvroDateSchema() {
public void createAvroTimestampSchema() {
final String specificSchema = "{" +
"\"type\":\"long\"," +
- "\"logicalType\":\"timestamp-millis\"}";
+ "\"logicalType\":\"timestamp-micros\"}";
Review Comment:
Its better to extend test classes one for each of these datatypes, rather
than replace this by TIMESTAMP_TYPE_NAME_MILLIS.
##########
ql/src/test/queries/clientpositive/avro_timestamp_micros.q:
##########
@@ -0,0 +1,3 @@
+CREATE EXTERNAL TABLE hive_test(`dt` timestamp) STORED AS AVRO;
Review Comment:
This needs to have more test cases. Consider atleast referring to existing
test case from millis and extend it for micros.
We are handling a lot of cases, such as proleptic calendar, zone based
conversions etc.
##########
storage-api/src/java/org/apache/hadoop/hive/common/type/CalendarUtils.java:
##########
@@ -180,6 +201,27 @@ public static long convertTimeToHybrid(long proleptic) {
return hybrid;
}
+ /**
+ * Convert epoch microseconds from the proleptic Gregorian calendar to the
+ * hybrid Julian/Gregorian.
+ * @param prolepticMicros Microseconds of epoch in the proleptic Gregorian
+ * @return Microseconds of epoch in the hybrid Julian/Gregorian
+ */
+ public static long convertTimeToHybridMicros(long prolepticMicros) {
+ long hybridMicros = prolepticMicros;
+ long prolepticMillis = prolepticMicros / 1_000L; // Convert micros to
millis
+
+ if (prolepticMillis < SWITCHOVER_MILLIS) {
+ String dateStr = PROLEPTIC_TIME_FORMAT.get().format(new
Date(prolepticMillis));
+ try {
+ hybridMicros = HYBRID_TIME_FORMAT.get().parse(dateStr).getTime() *
1_000L; // Convert millis back to micros
+ } catch (ParseException e) {
+ throw new IllegalArgumentException("Can't parse " + dateStr, e);
+ }
+ }
+ return hybridMicros;
+ }
Review Comment:
Same as earlier, consider keeping it as micros due to loss in precision
while changing to millis.
##########
serde/src/java/org/apache/hadoop/hive/serde2/avro/SchemaToTypeInfo.java:
##########
@@ -180,7 +180,8 @@ public static TypeInfo generateTypeInfo(Schema schema,
}
if (type == LONG &&
-
AvroSerDe.TIMESTAMP_TYPE_NAME.equals(schema.getProp(AvroSerDe.AVRO_PROP_LOGICAL_TYPE)))
{
+
(AvroSerDe.TIMESTAMP_TYPE_NAME_MILLIS.equals(schema.getProp(AvroSerDe.AVRO_PROP_LOGICAL_TYPE))
||
+
AvroSerDe.TIMESTAMP_TYPE_NAME_MICROS.equals(schema.getProp(AvroSerDe.AVRO_PROP_LOGICAL_TYPE))))
{
Review Comment:
Consider creating a function to handle all these timestamp cases. Currently
its only 2 cases, but looking at the extension itself, in the future it can be
more.
##########
serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerializer.java:
##########
@@ -231,10 +232,18 @@ private Object serializePrimitive(TypeInfo typeInfo,
PrimitiveObjectInspector fi
case TIMESTAMP:
Timestamp timestamp =
((TimestampObjectInspector)
fieldOI).getPrimitiveJavaObject(structFieldData);
+ LogicalType logicalType = schema.getLogicalType();
+ if (logicalType != null &&
logicalType.getName().equals(AvroSerDe.TIMESTAMP_TYPE_NAME_MICROS)) {
+ long micros = defaultProleptic ? timestamp.toEpochMicro() :
+
CalendarUtils.convertTimeToHybridMicros(timestamp.toEpochMicro());
+ timestamp = TimestampTZUtil.convertTimestampToZone(
+ Timestamp.ofEpochMicro(micros),
TimeZone.getDefault().toZoneId(), ZoneOffset.UTC, legacyConversion);
+ return timestamp.toEpochMicro();
+ }
long millis = defaultProleptic ? timestamp.toEpochMilli() :
- CalendarUtils.convertTimeToHybrid(timestamp.toEpochMilli());
+ CalendarUtils.convertTimeToHybrid(timestamp.toEpochMilli());
timestamp = TimestampTZUtil.convertTimestampToZone(
- Timestamp.ofEpochMilli(millis), TimeZone.getDefault().toZoneId(),
ZoneOffset.UTC, legacyConversion);
+ Timestamp.ofEpochMilli(millis),
TimeZone.getDefault().toZoneId(), ZoneOffset.UTC, legacyConversion);
Review Comment:
nit: Remove unnecessary whitespace changes to avoid rewriting git history
##########
serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroObjectInspectorGenerator.java:
##########
@@ -227,7 +227,7 @@ public class TestAvroObjectInspectorGenerator {
" \"fields\" : [\n" +
" {\"name\":\"timestampField\", " +
" \"type\":\"" + AvroSerDe.AVRO_LONG_TYPE_NAME + "\", " +
- " \"logicalType\":\"" + AvroSerDe.TIMESTAMP_TYPE_NAME + "\"}" +
+ " \"logicalType\":\"" + AvroSerDe.TIMESTAMP_TYPE_NAME_MILLIS +
"\"}" +
Review Comment:
Its better to extend test classes one for each of these datatypes, rather
than replace this by `TIMESTAMP_TYPE_NAME_MILLIS`.
##########
storage-api/src/java/org/apache/hadoop/hive/common/type/CalendarUtils.java:
##########
@@ -161,6 +161,27 @@ public static long convertTimeToProleptic(long hybrid) {
return proleptic;
}
+ /**
+ * Convert epoch microseconds from the hybrid Julian/Gregorian calendar to
the
+ * proleptic Gregorian.
+ * @param hybridMicros Microseconds of epoch in the hybrid Julian/Gregorian
+ * @return Microseconds of epoch in the proleptic Gregorian
+ */
+ public static long convertTimeToProlepticMicros(long hybridMicros) {
+ long prolepticMicros = hybridMicros;
+ long hybridMillis = hybridMicros / 1_000L; // Convert micros to millis
Review Comment:
This has the effect of removing the precision of the micros by a factor of
1000. Consider using micros itself.
Can we create SWITCHOVER_MICROS using the same logic as SWITCHOVER_MILLIS
for this?
##########
serde/src/java/org/apache/hadoop/hive/serde2/avro/TypeInfoToSchema.java:
##########
@@ -159,7 +159,7 @@ private Schema createAvroPrimitive(TypeInfo typeInfo) {
case TIMESTAMP:
schema = AvroSerdeUtils.getSchemaFor("{" +
"\"type\":\"" + AvroSerDe.AVRO_LONG_TYPE_NAME + "\"," +
- "\"logicalType\":\"" + AvroSerDe.TIMESTAMP_TYPE_NAME + "\"}");
+ "\"logicalType\":\"" + AvroSerDe.TIMESTAMP_TYPE_NAME_MICROS + "\"}");
Review Comment:
Why this case? Does this have the effect of making the type name as
`timestamp-micros` as default?
##########
serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroDeserializer.java:
##########
@@ -388,11 +389,21 @@ private Object deserializePrimitive(Object datum, Schema
fileSchema, Schema reco
skipProlepticConversion =
HiveConf.ConfVars.HIVE_AVRO_PROLEPTIC_GREGORIAN_DEFAULT.defaultBoolVal;
}
}
- Timestamp timestamp = TimestampTZUtil.convertTimestampToZone(
- Timestamp.ofEpochMilli((Long) datum), ZoneOffset.UTC,
convertToTimeZone, legacyConversion);
- if (!skipProlepticConversion) {
+ LogicalType logicalType = fileSchema.getLogicalType();
+ Timestamp timestamp;
+ if (logicalType != null &&
logicalType.getName().equals(AvroSerDe.TIMESTAMP_TYPE_NAME_MICROS)) {
Review Comment:
Use `equalsIgnoreCase`.
##########
storage-api/src/java/org/apache/hadoop/hive/common/type/CalendarUtils.java:
##########
@@ -161,6 +161,27 @@ public static long convertTimeToProleptic(long hybrid) {
return proleptic;
}
+ /**
+ * Convert epoch microseconds from the hybrid Julian/Gregorian calendar to
the
+ * proleptic Gregorian.
+ * @param hybridMicros Microseconds of epoch in the hybrid Julian/Gregorian
+ * @return Microseconds of epoch in the proleptic Gregorian
+ */
+ public static long convertTimeToProlepticMicros(long hybridMicros) {
+ long prolepticMicros = hybridMicros;
+ long hybridMillis = hybridMicros / 1_000L; // Convert micros to millis
+
+ if (hybridMillis < SWITCHOVER_MILLIS) {
+ String dateStr = HYBRID_TIME_FORMAT.get().format(new Date(hybridMillis));
+ try {
+ prolepticMicros = PROLEPTIC_TIME_FORMAT.get().parse(dateStr).getTime()
* 1_000L; // Convert millis back to micros
Review Comment:
Precision is lost, consider keeping it as micros itself.
##########
serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroDeserializer.java:
##########
@@ -388,11 +389,21 @@ private Object deserializePrimitive(Object datum, Schema
fileSchema, Schema reco
skipProlepticConversion =
HiveConf.ConfVars.HIVE_AVRO_PROLEPTIC_GREGORIAN_DEFAULT.defaultBoolVal;
}
}
- Timestamp timestamp = TimestampTZUtil.convertTimestampToZone(
- Timestamp.ofEpochMilli((Long) datum), ZoneOffset.UTC,
convertToTimeZone, legacyConversion);
- if (!skipProlepticConversion) {
+ LogicalType logicalType = fileSchema.getLogicalType();
+ Timestamp timestamp;
+ if (logicalType != null &&
logicalType.getName().equals(AvroSerDe.TIMESTAMP_TYPE_NAME_MICROS)) {
+ timestamp = Timestamp.ofEpochMicro((Long) datum);
+ } else {
+ timestamp = Timestamp.ofEpochMilli((Long) datum);
+ }
+ timestamp = TimestampTZUtil.convertTimestampToZone(
+ timestamp, ZoneOffset.UTC, convertToTimeZone, legacyConversion);
+ if (!skipProlepticConversion && logicalType != null &&
logicalType.getName().equals(AvroSerDe.TIMESTAMP_TYPE_NAME_MICROS)) {
Review Comment:
This can be simplified into a nested if block -
if (!skipProlepticConversion) {
// Add other if checks
....
}
Also use `equalsIgnoreCase`
##########
serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroSerializer.java:
##########
@@ -125,7 +125,7 @@ public void canSerializeDoubles() throws SerDeException,
IOException {
public void canSerializeTimestamps() throws SerDeException, IOException {
singleFieldTest("timestamp1", Timestamp.valueOf("2011-01-01
00:00:00").toEpochMilli(),
"\"" + AvroSerDe.AVRO_LONG_TYPE_NAME + "\"," +
- "\"logicalType\":\"" + AvroSerDe.TIMESTAMP_TYPE_NAME + "\"");
+ "\"logicalType\":\"" + AvroSerDe.TIMESTAMP_TYPE_NAME_MILLIS + "\"");
Review Comment:
Its better to extend test classes one for each of these datatypes, rather
than replace this by TIMESTAMP_TYPE_NAME_MILLIS.
##########
common/src/java/org/apache/hadoop/hive/common/type/Timestamp.java:
##########
@@ -162,6 +162,11 @@ public long toEpochMilli() {
return localDateTime.toInstant(ZoneOffset.UTC).toEpochMilli();
}
+ public long toEpochMicro() {
+ return localDateTime.toEpochSecond(ZoneOffset.UTC) * 1_000_000
+ + localDateTime.getNano() / 1000;
+ }
+
public long toEpochMilli(ZoneId id) {
Review Comment:
Isn't zone conversion also applicable for micros?
##########
serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerializer.java:
##########
@@ -231,10 +232,18 @@ private Object serializePrimitive(TypeInfo typeInfo,
PrimitiveObjectInspector fi
case TIMESTAMP:
Timestamp timestamp =
((TimestampObjectInspector)
fieldOI).getPrimitiveJavaObject(structFieldData);
+ LogicalType logicalType = schema.getLogicalType();
+ if (logicalType != null &&
logicalType.getName().equals(AvroSerDe.TIMESTAMP_TYPE_NAME_MICROS)) {
+ long micros = defaultProleptic ? timestamp.toEpochMicro() :
+
CalendarUtils.convertTimeToHybridMicros(timestamp.toEpochMicro());
+ timestamp = TimestampTZUtil.convertTimestampToZone(
+ Timestamp.ofEpochMicro(micros),
TimeZone.getDefault().toZoneId(), ZoneOffset.UTC, legacyConversion);
+ return timestamp.toEpochMicro();
+ }
long millis = defaultProleptic ? timestamp.toEpochMilli() :
- CalendarUtils.convertTimeToHybrid(timestamp.toEpochMilli());
+ CalendarUtils.convertTimeToHybrid(timestamp.toEpochMilli());
Review Comment:
nit: Remove unnecessary whitespace changes to avoid rewriting git history
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]