[ 
https://issues.apache.org/jira/browse/AVRO-3834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Stagg updated AVRO-3834:
------------------------------
    Description: 
When encoding `decimal.Decimal` values using the python avro library, the 
exponent of the value is largely ignored.

This means that incorrect twos-complement values are calculated, and we end up 
with incorrect avros are produced.

Here's a reasonably compact reproducer:
{code:python}
import avro
import avro.io
from decimal import Decimal
from io import BytesIO

TESTS = [
    '314',
    '31',
    '3',
    '3.1',
    '31.4',
    '3.14',
    '3.141',
    '3.1415',
]

if __name__ == '__main__':
    schema_text = '''{
  "type": "bytes",
  "logicalType": "decimal",
  "precision": 8,
  "scale": 4
    }'''
    print(f"AVRO VERSION: {avro.__version__}")
    schema = avro.schema.parse(schema_text)
    writer = avro.io.DatumWriter(schema)
    reader = avro.io.DatumReader(schema)

    for val in TESTS:
        buf = BytesIO()

        val = Decimal(val)
        writer.write(val, avro.io.BinaryEncoder(buf))
        buf.seek(0)
        decoded_val = reader.read(avro.io.BinaryDecoder(buf))
        
        match = val == decoded_val
        result = 'PASS' if match else 'FAIL'
        print(f'Encoded: {val} -> {buf.getvalue()} -> {decoded_val}   {result}')
 {code}
Which outputs:
{code:java}
AVRO VERSION: 1.11.2
Encoded: 314 -> b'\x04\x01:' -> 0.0314   FAIL
Encoded: 31 -> b'\x02\x1f' -> 0.0031   FAIL
Encoded: 3 -> b'\x02\x03' -> 0.0003   FAIL
Encoded: 3.1 -> b'\x02\x1f' -> 0.0031   FAIL
Encoded: 31.4 -> b'\x04\x01:' -> 0.0314   FAIL
Encoded: 3.14 -> b'\x04\x01:' -> 0.0314   FAIL
Encoded: 3.141 -> b'\x04\x0cE' -> 0.3141   FAIL
Encoded: 3.1415 -> b'\x04z\xb7' -> 3.1415   PASS{code}
The problem is that the code here:
[https://github.com/apache/avro/blob/5bd2bc7a492a611382cddc5db3b5bf0b1b7d2b83/lang/py/avro/io.py#L468]
does not use `exp` to shift the digits, exp is just checked to ensure it's not 
greater than scale for validation purposes.

If you look in the output, the produced avro bytes for '31.4' and '3.14' is 
identical, because the exp is ignored.

  was:
When encoding `decimal.Decimal` values using the python avro library, the 
exponent of the value is largely ignored.

This means that incorrect twos-complement values are calculated, and we end up 
with incorrect avros are produced.

Here's a reasonalby compact reproducer:
{code:java}
import avro
import avro.io
from decimal import Decimal
from io import BytesIO
TESTS = [
'314',
'31',
'3',
'3.1',
'31.4',
'3.14',
'3.141',
'3.1415',
]
if _name_ == '_main_':
schema_text = '''
{ "type": "bytes", "logicalType": "decimal", "precision": 8, "scale": 4 }
'''
print(f"AVRO VERSION:
{avro.__version__}
")
schema = avro.schema.parse(schema_text)
writer = avro.io.DatumWriter(schema)
reader = avro.io.DatumReader(schema)
for val in TESTS:
buf = BytesIO()
val = Decimal(val)
writer.write(val, avro.io.BinaryEncoder(buf))
buf.seek(0)
decoded_val = reader.read(avro.io.BinaryDecoder(buf))
match = val == decoded_val
result = 'PASS' if match else 'FAIL'
print(f'Encoded: {val} -> {buf.getvalue()} -> {decoded_val}  {result}') {code}

Which outputs:


{code:java}
AVRO VERSION: 1.11.2
Encoded: 314 -> b'\x04\x01:' -> 0.0314   FAIL
Encoded: 31 -> b'\x02\x1f' -> 0.0031   FAIL
Encoded: 3 -> b'\x02\x03' -> 0.0003   FAIL
Encoded: 3.1 -> b'\x02\x1f' -> 0.0031   FAIL
Encoded: 31.4 -> b'\x04\x01:' -> 0.0314   FAIL
Encoded: 3.14 -> b'\x04\x01:' -> 0.0314   FAIL
Encoded: 3.141 -> b'\x04\x0cE' -> 0.3141   FAIL
Encoded: 3.1415 -> b'\x04z\xb7' -> 3.1415   PASS{code}
The problem is that the code here:
[https://github.com/apache/avro/blob/5bd2bc7a492a611382cddc5db3b5bf0b1b7d2b83/lang/py/avro/io.py#L468]
does not use `exp` to shift the digits, exp is just checked to ensure it's not 
greater than scale for validation purposes.

If you look in the output, the produced avro bytes for '31.4' and '3.14' is 
identical, because the exp is ignored.


> [Python] Incorrect decimal encoding/decoding
> --------------------------------------------
>
>                 Key: AVRO-3834
>                 URL: https://issues.apache.org/jira/browse/AVRO-3834
>             Project: Apache Avro
>          Issue Type: Bug
>          Components: logical types, python
>    Affects Versions: 1.11.2
>         Environment: Python 3.10.3, Avro 1.11.2
>  
>            Reporter: Steve Stagg
>            Priority: Major
>
> When encoding `decimal.Decimal` values using the python avro library, the 
> exponent of the value is largely ignored.
> This means that incorrect twos-complement values are calculated, and we end 
> up with incorrect avros are produced.
> Here's a reasonably compact reproducer:
> {code:python}
> import avro
> import avro.io
> from decimal import Decimal
> from io import BytesIO
> TESTS = [
>     '314',
>     '31',
>     '3',
>     '3.1',
>     '31.4',
>     '3.14',
>     '3.141',
>     '3.1415',
> ]
> if __name__ == '__main__':
>     schema_text = '''{
>   "type": "bytes",
>   "logicalType": "decimal",
>   "precision": 8,
>   "scale": 4
>     }'''
>     print(f"AVRO VERSION: {avro.__version__}")
>     schema = avro.schema.parse(schema_text)
>     writer = avro.io.DatumWriter(schema)
>     reader = avro.io.DatumReader(schema)
>     for val in TESTS:
>         buf = BytesIO()
>         val = Decimal(val)
>         writer.write(val, avro.io.BinaryEncoder(buf))
>         buf.seek(0)
>         decoded_val = reader.read(avro.io.BinaryDecoder(buf))
>         
>         match = val == decoded_val
>         result = 'PASS' if match else 'FAIL'
>         print(f'Encoded: {val} -> {buf.getvalue()} -> {decoded_val}   
> {result}')
>  {code}
> Which outputs:
> {code:java}
> AVRO VERSION: 1.11.2
> Encoded: 314 -> b'\x04\x01:' -> 0.0314   FAIL
> Encoded: 31 -> b'\x02\x1f' -> 0.0031   FAIL
> Encoded: 3 -> b'\x02\x03' -> 0.0003   FAIL
> Encoded: 3.1 -> b'\x02\x1f' -> 0.0031   FAIL
> Encoded: 31.4 -> b'\x04\x01:' -> 0.0314   FAIL
> Encoded: 3.14 -> b'\x04\x01:' -> 0.0314   FAIL
> Encoded: 3.141 -> b'\x04\x0cE' -> 0.3141   FAIL
> Encoded: 3.1415 -> b'\x04z\xb7' -> 3.1415   PASS{code}
> The problem is that the code here:
> [https://github.com/apache/avro/blob/5bd2bc7a492a611382cddc5db3b5bf0b1b7d2b83/lang/py/avro/io.py#L468]
> does not use `exp` to shift the digits, exp is just checked to ensure it's 
> not greater than scale for validation purposes.
> If you look in the output, the produced avro bytes for '31.4' and '3.14' is 
> identical, because the exp is ignored.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to