okumin commented on code in PR #5584:
URL: https://github.com/apache/hive/pull/5584#discussion_r1929792190
##########
serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyHiveDecimal.java:
##########
@@ -36,7 +36,7 @@ public class LazyHiveDecimal extends
LazyPrimitive<LazyHiveDecimalObjectInspecto
private final int precision;
private final int scale;
- private static final byte[] nullBytes = new byte[]{0x0, 0x0, 0x0, 0x0};
+ private static final byte[] nullBytes = new byte[]{'N', 'U', 'L', 'L'};
Review Comment:
I have another question. Why did `org.apache.hadoop.hive.serde2.JsonSerDe`
nullify too big numbers but `org.apache.hive.hcatalog.data.JsonSerDe` didn't?
As far as I debug it, that's because
[HiveJsonReader#optionallyWrapWritable](https://github.com/apache/hive/blob/5a9e289c8646649136b253841847e578326e41be/serde/src/java/org/apache/hadoop/hive/serde2/json/HiveJsonReader.java#L254)
converts a HiveDecimal and apply `enforcePrecisionScale` in the case of
serde2. The HCatalog version doesn't enable `PRIMITIVE_TO_WRITABLE` and the
original value, `1000000000000000000000000000000000000.00` in the case of the
test case is used.
I guess we would see the consistent result if we applied
`HiveDecimalUtils.enforcePrecisionScale` once on decoding a JSON number.
##########
serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyHiveDecimal.java:
##########
@@ -36,7 +36,7 @@ public class LazyHiveDecimal extends
LazyPrimitive<LazyHiveDecimalObjectInspecto
private final int precision;
private final int scale;
- private static final byte[] nullBytes = new byte[]{0x0, 0x0, 0x0, 0x0};
+ private static final byte[] nullBytes = new byte[]{'N', 'U', 'L', 'L'};
Review Comment:
This comment explains another approach that adjusts the precision and scale
in the `HiveJsonReader`. `HiveJsonReader` decodes a JSON number into a
HiveDecimal
[here](https://github.com/apache/hive/blob/rel/release-4.0.1/serde/src/java/org/apache/hadoop/hive/serde2/json/HiveJsonReader.java#L423-L424).
Originally, `leafNode.asText()` was a number text which was already rounded
into float64. Now, it could be any number instead. I presume we can round or
nullify it based on the column definition in the schema.
This is what I imagine.
```
case DECIMAL:
HiveDecimal decimal = HiveDecimal.create(leafNode.asText());
return HiveDecimalUtils.enforcePrecisionScale(decimal,
(DecimalTypeInfor) typeInfo);
```
The entire steps will be like this.
1. ObjectMapper decodes any number as BigDecimal
2. HiveJsonReader creates a HiveDecimal where the precision and scale in the
schema are applied
3. The following steps in Hive will see decimals that comply with the schema
I guess we apply similar conversions for
[Parquet](https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/ETypeConverter.java#L107-L108).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]