Re: [PR] HIVE-28673: Fix issues in JSON SerDe implementations related to Decimal [hive]

via GitHub Sun, 26 Jan 2025 06:47:00 -0800


okumin commented on code in PR #5584:
URL: https://github.com/apache/hive/pull/5584#discussion_r1929792190



##########
serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyHiveDecimal.java:
##########
@@ -36,7 +36,7 @@ public class LazyHiveDecimal extends 
LazyPrimitive<LazyHiveDecimalObjectInspecto
 
   private final int precision;
   private final int scale;
-  private static final byte[] nullBytes = new byte[]{0x0, 0x0, 0x0, 0x0};
+  private static final byte[] nullBytes = new byte[]{'N', 'U', 'L', 'L'};

Review Comment:
   I have another question. Why did `org.apache.hadoop.hive.serde2.JsonSerDe` 
nullify too big numbers but `org.apache.hive.hcatalog.data.JsonSerDe` didn't? 
As far as I debug it, that's because 
[HiveJsonReader#optionallyWrapWritable](https://github.com/apache/hive/blob/5a9e289c8646649136b253841847e578326e41be/serde/src/java/org/apache/hadoop/hive/serde2/json/HiveJsonReader.java#L254)
 converts a HiveDecimal and apply `enforcePrecisionScale` in the case of 
serde2. The HCatalog version doesn't enable `PRIMITIVE_TO_WRITABLE` and the 
original value, `1000000000000000000000000000000000000.00` in the case of the 
test case is used.
   
   I guess we would see the consistent result if we applied 
`HiveDecimalUtils.enforcePrecisionScale` once on decoding a JSON number.



##########
serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyHiveDecimal.java:
##########
@@ -36,7 +36,7 @@ public class LazyHiveDecimal extends 
LazyPrimitive<LazyHiveDecimalObjectInspecto
 
   private final int precision;
   private final int scale;
-  private static final byte[] nullBytes = new byte[]{0x0, 0x0, 0x0, 0x0};
+  private static final byte[] nullBytes = new byte[]{'N', 'U', 'L', 'L'};

Review Comment:
   This comment explains another approach that adjusts the precision and scale 
in the `HiveJsonReader`. `HiveJsonReader` decodes a JSON number into a 
HiveDecimal 
[here](https://github.com/apache/hive/blob/rel/release-4.0.1/serde/src/java/org/apache/hadoop/hive/serde2/json/HiveJsonReader.java#L423-L424).
 Originally, `leafNode.asText()` was a number text which was already rounded 
into float64. Now, it could be any number instead. I presume we can round or 
nullify it based on the column definition in the schema.
   
   This is what I imagine.
   
   ```
       case DECIMAL:
         HiveDecimal decimal = HiveDecimal.create(leafNode.asText());
         return HiveDecimalUtils.enforcePrecisionScale(decimal, 
(DecimalTypeInfor) typeInfo);
   ```
   
   The entire steps will be like this.
   
   1. ObjectMapper decodes any number as BigDecimal
   2. HiveJsonReader creates a HiveDecimal where the precision and scale in the 
schema are applied
   3. The following steps in Hive will see decimals that comply with the schema
   
   I guess we apply similar conversions for 
[Parquet](https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/ETypeConverter.java#L107-L108).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org

Re: [PR] HIVE-28673: Fix issues in JSON SerDe implementations related to Decimal [hive]

Reply via email to