okumin commented on code in PR #5584: URL: https://github.com/apache/hive/pull/5584#discussion_r1929792190
########## serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyHiveDecimal.java: ########## @@ -36,7 +36,7 @@ public class LazyHiveDecimal extends LazyPrimitive<LazyHiveDecimalObjectInspecto private final int precision; private final int scale; - private static final byte[] nullBytes = new byte[]{0x0, 0x0, 0x0, 0x0}; + private static final byte[] nullBytes = new byte[]{'N', 'U', 'L', 'L'}; Review Comment: I have another question. Why did `org.apache.hadoop.hive.serde2.JsonSerDe` nullify too big numbers but `org.apache.hive.hcatalog.data.JsonSerDe` didn't? As far as I debug it, that's because [HiveJsonReader#optionallyWrapWritable](https://github.com/apache/hive/blob/5a9e289c8646649136b253841847e578326e41be/serde/src/java/org/apache/hadoop/hive/serde2/json/HiveJsonReader.java#L254) converts a HiveDecimal and apply `enforcePrecisionScale` in the case of serde2. The HCatalog version doesn't enable `PRIMITIVE_TO_WRITABLE` and the original value, `1000000000000000000000000000000000000.00` in the case of the test case is used. I guess we would see the consistent result if we applied `HiveDecimalUtils.enforcePrecisionScale` once on decoding a JSON number. ########## serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyHiveDecimal.java: ########## @@ -36,7 +36,7 @@ public class LazyHiveDecimal extends LazyPrimitive<LazyHiveDecimalObjectInspecto private final int precision; private final int scale; - private static final byte[] nullBytes = new byte[]{0x0, 0x0, 0x0, 0x0}; + private static final byte[] nullBytes = new byte[]{'N', 'U', 'L', 'L'}; Review Comment: This comment explains another approach that adjusts the precision and scale in the `HiveJsonReader`. `HiveJsonReader` decodes a JSON number into a HiveDecimal [here](https://github.com/apache/hive/blob/rel/release-4.0.1/serde/src/java/org/apache/hadoop/hive/serde2/json/HiveJsonReader.java#L423-L424). Originally, `leafNode.asText()` was a number text which was already rounded into float64. Now, it could be any number instead. I presume we can round or nullify it based on the column definition in the schema. This is what I imagine. ``` case DECIMAL: HiveDecimal decimal = HiveDecimal.create(leafNode.asText()); return HiveDecimalUtils.enforcePrecisionScale(decimal, (DecimalTypeInfor) typeInfo); ``` The entire steps will be like this. 1. ObjectMapper decodes any number as BigDecimal 2. HiveJsonReader creates a HiveDecimal where the precision and scale in the schema are applied 3. The following steps in Hive will see decimals that comply with the schema I guess we apply similar conversions for [Parquet](https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/ETypeConverter.java#L107-L108). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org