[jira] [Commented] (SPARK-40032) Support Decimal128 type

jiaan.geng (Jira) Wed, 10 Aug 2022 04:24:07 -0700


    [ 
https://issues.apache.org/jira/browse/SPARK-40032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17577942#comment-17577942
 ]


jiaan.geng commented on SPARK-40032:
------------------------------------

We are editing the design doc. 

> Support Decimal128 type
> -----------------------
>
>                 Key: SPARK-40032
>                 URL: https://issues.apache.org/jira/browse/SPARK-40032
>             Project: Spark
>          Issue Type: New Feature
>          Components: SQL
>    Affects Versions: 3.4.0
>            Reporter: jiaan.geng
>            Priority: Major
>         Attachments: Performance comparison between decimal128 and spark 
> decimal benchmark.pdf
>
>
> Spark SQL today supports the DECIMAL data type. The implementation of Decimal 
> that can hold a BigDecimal or Long.  Decimal provides some operators like +, 
> -, *, / and so on.
> Take the + as example, the implementation show below.
> {code:java}
>   def + (that: Decimal): Decimal = {
>     if (decimalVal.eq(null) && that.decimalVal.eq(null) && scale == 
> that.scale) {
>       Decimal(longVal + that.longVal, Math.max(precision, that.precision) + 
> 1, scale)
>     } else {
>       Decimal(toBigDecimal.bigDecimal.add(that.toBigDecimal.bigDecimal))
>     }
>   }
> {code}
> We can see there exists two addition and call Decimal.apply. The add operator 
> of BigDecimal will construct a new BigDecimal instance.
> The implementation of Decimal.apply will call new to construct a new Decimal 
> instance with the new BigDecimal instance.
> As we know, Decimal instance will hold the new BigDecimal instance.
> If a large table has a Decimal field called 'colA, the execution of 
> SUM('colA) will involve the creation of a large number of Decimal instances 
> and BigDecimal instances. These Decimal instances and BigDecimal instances 
> will lead to garbage collection to occur frequently.
> Decimal128 is a high-performance decimal about 8X more efficient than Java 
> BigDecimal for typical operations. It uses a finite (128 bit) precision and 
> can handle up to decimal(38, X). It is also "mutable" so you can change the 
> contents of an existing object. This helps reduce the cost of new() and 
> garbage collection.
> We have generate a benchmark report for compare Spark Decimal, Java 
> BigDecimal and Decimal128. Please see the attachment.
> In this new feature, we will introduce DECIMAL128 to accelerate decimal 
> calculation.
> h3. Milestone 1 – Spark Decimal equivalency ( The new Decimal type Decimal128 
> meets or exceeds all function of the existing SQL Decimal):
> * Add a new DataType implementation for Decimal128.
> * Support Decimal128 in Dataset/UDF.
> * Decimal128 literals
> * Decimal128 arithmetic(e.g. Decimal128 + Decimal128, Decimal128 - Decimal)
> * Decimal or Math functions/operators: POWER, LOG, Round, etc
> * Cast to and from Decimal128, cast String/Decimal to Decimal128, cast 
> Decimal128 to string (pretty printing)/Decimal, with the * * SQL syntax to 
> specify the types
> * Support sorting Decimal128.
> h3. Milestone 2 – Persistence:
>  * Ability to create tables of type Decimal128
>  * Ability to write to common file formats such as Parquet and JSON.
>  * INSERT, SELECT, UPDATE, MERGE
>  * Discovery
> h3. Milestone 3 – Client support
>  * JDBC support
>  * Hive Thrift server
> h3. Milestone 4 – PySpark and Spark R integration
>  * Python UDF can take and return Decimal128
>  * DataFrame support



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-40032) Support Decimal128 type

Reply via email to