hudi-bot opened a new issue, #15719:
URL: https://github.com/apache/hudi/issues/15719

   As reported in: [https://github.com/apache/hudi/issues/7732]
   
    
   
   Currently we've limited precision of the supported decimals at 30 assuming 
that this number is reasonably high to cover 99% of use-cases, but it seems 
like there's still a demand for even larger Decimals.
   
   The challenge is however to balance the need to support longer Decimals vs 
storage space we have to provision for each one of them.
   
    
   
   ## JIRA info
   
   - Link: https://issues.apache.org/jira/browse/HUDI-5608
   - Type: Bug
   - Epic: https://issues.apache.org/jira/browse/HUDI-1822
   - Affects version(s):
     - 0.12.2
   - Fix version(s):
     - 1.1.0
   
   
   ---
   
   
   ## Comments
   
   08/Feb/23 00:25;rchertara;Synced with Alexey, based on our dicussion we will 
need to add an additional DecimalWrapperV2 to the schema, and make this 
flexible to encode the precision and scale. ;;;
   
   ---
   
   21/Feb/23 13:51;kazdy;Hi [~rchertara]  and [~alexey.kudinkin],
   One of my teammates stumbled upon a similar issue in Hudi 0.12.1, so I 
wanted to share it with you.
   First write to Hudi was with DecimalType(4,0), second write with 
DecimalType(2,0).
   So it's not always the case with high precision.
   We got the same exception as in mentioned GH issue: 
AvroTypeException("Cannot encode decimal with precision 4 as max precision 2).
   
   Another thing you might want to consider is how Spark behaves by default if 
you infer schema. 
   Say you're reading from json and want to write to Hudi and infer json schema 
(pretty common usecase, data producers usually don't provide schemas for json 
files), then Spark will set the decimal to be DecimalType(38, 18), if you only 
supported precision up to 30 it will be breaking some pipelines that rely on 
schema inference.
   
   [~rchertara]  can this be solved by adding DecimalWrapperV2 as well?;;;
   
   ---
   
   29/Jun/23 08:24;赵富午;Is there any new progress?;;;


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to