hudi-bot opened a new issue, #15719: URL: https://github.com/apache/hudi/issues/15719
As reported in: [https://github.com/apache/hudi/issues/7732] Currently we've limited precision of the supported decimals at 30 assuming that this number is reasonably high to cover 99% of use-cases, but it seems like there's still a demand for even larger Decimals. The challenge is however to balance the need to support longer Decimals vs storage space we have to provision for each one of them. ## JIRA info - Link: https://issues.apache.org/jira/browse/HUDI-5608 - Type: Bug - Epic: https://issues.apache.org/jira/browse/HUDI-1822 - Affects version(s): - 0.12.2 - Fix version(s): - 1.1.0 --- ## Comments 08/Feb/23 00:25;rchertara;Synced with Alexey, based on our dicussion we will need to add an additional DecimalWrapperV2 to the schema, and make this flexible to encode the precision and scale. ;;; --- 21/Feb/23 13:51;kazdy;Hi [~rchertara] and [~alexey.kudinkin], One of my teammates stumbled upon a similar issue in Hudi 0.12.1, so I wanted to share it with you. First write to Hudi was with DecimalType(4,0), second write with DecimalType(2,0). So it's not always the case with high precision. We got the same exception as in mentioned GH issue: AvroTypeException("Cannot encode decimal with precision 4 as max precision 2). Another thing you might want to consider is how Spark behaves by default if you infer schema. Say you're reading from json and want to write to Hudi and infer json schema (pretty common usecase, data producers usually don't provide schemas for json files), then Spark will set the decimal to be DecimalType(38, 18), if you only supported precision up to 30 it will be breaking some pipelines that rely on schema inference. [~rchertara] can this be solved by adding DecimalWrapperV2 as well?;;; --- 29/Jun/23 08:24;赵富午;Is there any new progress?;;; -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
