[
https://issues.apache.org/jira/browse/AVRO-3520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17541700#comment-17541700
]
Ashley Taylor commented on AVRO-3520:
-------------------------------------
I have done some debugging.
the `ACCESSOR_CACHE` schema is always the schema that would reflect the state
when writing a file. Schema is generated from the class currently in the
runtime.
Reading a file, the schema is defined at the top of each Avro file.
The `ACCESSOR_CACHE` will be invoked, and the first instance of CustomEncoding
is created.
But when we read a file, the schema is defined at the to of each file, and when
we get the FieldAccesor, that per file schema is passed to the
`getAccessorsFor` method
https://github.com/apache/avro/blob/ff4eaf32c1fb4f04770a6ad39f2769cc907006e4/lang/java/avro/src/main/java/org/apache/avro/reflect/ReflectData.java#L379
This then calls the setReadSchema giving the ability to return a new
CustomEncoding instance that is appropriate for that schema
So if you had multiple threads with multiple schemas at the top of each file.
they will get the correct instance of CustomEncoding for that file reader.
If the schema was passed to the read method and the CustomEncoder and was
shared across threads in the case of a parallel read.
Then doing any initialization for that schema becomes a lot more complicated to
account for concurrency or unnecessary per row recalculations.
> CustomEncoding doesn't expose the read schema
> ---------------------------------------------
>
> Key: AVRO-3520
> URL: https://issues.apache.org/jira/browse/AVRO-3520
> Project: Apache Avro
> Issue Type: Bug
> Components: java
> Affects Versions: 1.11.0
> Reporter: Colin
> Priority: Major
> Labels: pull-request-available
> Attachments: patchTest.txt
>
> Time Spent: 10m
> Remaining Estimate: 0h
>
> Currently it is not possible to detect a schema change when using
> `CustomEncoding<T>`.
--
This message was sent by Atlassian Jira
(v8.20.7#820007)