[
https://issues.apache.org/jira/browse/AVRO-2636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16985980#comment-16985980
]
Ryan Skraba commented on AVRO-2636:
-----------------------------------
Thanks for the use case!
As noted, BYTES is particularly bad because *reading* the ByteBuffer will
change its state.
It gets a bit worse: the default value can be mutable RECORD, ARRAY, MAP
values, which will likely behave badly if the user decides to modify these
values in a multithreaded environment. Those types can contain nested data of
type BYTES.
Maybe the right thing to do *IS* to always return a `deepCopy` from this
method... we might need to do fixes in the Avro internals to make sure we're
not making unnecessary copies. Even so, to check whethr we need a ThreadLocal
cache for reentrant deepCopy calls.
While looking for how this would affect others, I took a quick look at [Parquet
using
{{getDefaultValue}}|https://github.com/apache/parquet-mr/blob/62dcc68acaf64012bf731e103be780956f1f446d/parquet-avro/src/main/java/org/apache/parquet/avro/AvroRecordConverter.java#L168-L172].
It makes multiple calls to test for presence of a default, to get the default
value, and it appears to throw away the copy if it's not needed.
I'll investigate a bit further, but opinions welcome!
> GenericData defaultValueCache caches mutable ByteBuffers
> --------------------------------------------------------
>
> Key: AVRO-2636
> URL: https://issues.apache.org/jira/browse/AVRO-2636
> Project: Apache Avro
> Issue Type: Bug
> Components: java
> Reporter: Valentin Nikotin
> Priority: Minor
>
> It appears that for default value for Byte type (and Decimal logical type if
> it uses underlying Bytes type) value rendered with getDefaultValue is cached.
> This leads to bugs when you read the same value (for example if converted
> with DecimalConversion). For single thread environment workaround would be to
> reset ByteBuffer after read, but in concurrent environment we should not
> cache mutable objects.
>
> {code:java}
> @Test(expected=NumberFormatException.class)
> public void testReuse() {
> Conversions.DecimalConversion decimalConversion =
> new Conversions.DecimalConversion();
> LogicalType logicalDecimal =
> LogicalTypes.decimal(38, 9);
> ByteBuffer defaultValue =
> decimalConversion.toBytes(
> BigDecimal.valueOf(42L).setScale(9),
> null,
> logicalDecimal);
> Schema schema = SchemaBuilder
> .record("test")
> .fields()
> .name("decimal")
>
> .type(logicalDecimal.addToSchema(SchemaBuilder.builder().bytesType()))
> .withDefault(defaultValue)
> .endRecord();
> BigDecimal firstRead = decimalConversion
> .fromBytes(
> (ByteBuffer)
> GenericData.get().getDefaultValue(schema.getField("decimal")),
> null,
> logicalDecimal);
> BigDecimal secondRead = decimalConversion
> .fromBytes(
> (ByteBuffer)
> GenericData.get().getDefaultValue(schema.getField("decimal")),
> null,
> logicalDecimal);
> }
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)