[
https://issues.apache.org/jira/browse/AVRO-3005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ryan Skraba updated AVRO-3005:
------------------------------
Resolution: Duplicate
Status: Resolved (was: Patch Available)
> Deserialization of string with > 256 characters fails
> -----------------------------------------------------
>
> Key: AVRO-3005
> URL: https://issues.apache.org/jira/browse/AVRO-3005
> Project: Apache Avro
> Issue Type: Bug
> Components: csharp
> Affects Versions: 1.10.1
> Reporter: Lucas Heimberg
> Priority: Major
> Attachments: AVRO-3005.patch
>
>
> Avro.IO.BinaryDecoder.ReadString() fails for strings with length > 256, i.e.
> when the StackallocThreshold is exceeded.
> This can be seen when serializing and subsequently deserializing a
> GenericRecord of schema
> {code:java}
> {
> "type": "record",
> "name": "Foo",
> "fields": [
> { "name": "x", "type": "string" }
> ]
> }{code}
> with a field x containing a string of length > 256, as done in the test case
> Test(257):
> {code:java}
> public void Test(int n)
> {
> var schema = (RecordSchema) Schema.Parse("{ \"type\":\"record\",
> \"name\":\"Foo\",\"fields\":[{\"name\":\"x\",\"type\":\"string\"}]}");
>
> var datum = new GenericRecord(schema);
> datum.Add("x", new String('x', n));
> byte[] serialized;
> using (var ms = new MemoryStream())
> {
> var enc = new BinaryEncoder(ms);
> var writer = new GenericDatumWriter<GenericRecord>(schema);
> writer.Write(datum, enc);
> serialized = ms.ToArray();
> }
> using (var ms = new MemoryStream(serialized))
> {
> var dec = new BinaryDecoder(ms);
> var deserialized = new GenericRecord(schema);
> var reader = new GenericDatumReader<GenericRecord>(schema, schema);
> reader.Read(deserialized, dec);
> Assert.Equal(datum, deserialized);
> }
> }{code}
> which yields the following exception
> {code:java}
> Avro.AvroException
> End of stream reached
> at Avro.IO.BinaryDecoder.Read(Span`1 buffer)
> at Avro.IO.BinaryDecoder.ReadString()
> at
> Avro.Generic.PreresolvingDatumReader`1.<>c.<ResolveReader>b__21_1(Decoder d)
> at
> Avro.Generic.PreresolvingDatumReader`1.<>c__DisplayClass37_0.<Read>b__0(Object
> r, Decoder d)
> at
> Avro.Generic.PreresolvingDatumReader`1.<>c__DisplayClass23_1.<ResolveRecord>b__2(Object
> rec, Decoder d)
> at Avro.Generic.PreresolvingDatumReader`1.ReadRecord(Object reuse, Decoder
> decoder, RecordAccess recordAccess, IEnumerable`1 readSteps)
> at
> Avro.Generic.PreresolvingDatumReader`1.<>c__DisplayClass23_0.<ResolveRecord>b__0(Object
> r, Decoder d)
> at Avro.Generic.PreresolvingDatumReader`1.Read(T reuse, Decoder decoder)
> at AvroTests.AvroTests.Test(Int32 n) in
> C:\Users\l.heimberg\Source\Repos\AvroTests\AvroTests\AvroTests.cs:line 41
> {code}
> The reason seems to be the following: when a string of length <=
> StackallocThreshold (=256) is read, a buffer, to read the content of the
> string from the stream into, is allocated on the stack with the exact length
> of the string. If the length is > StackallocThreshold, the buffer is obtained
> from ArrayPool<byte>.Shared.Rent(length), which returns a buffer of *minimum*
> length 'length', but possibly also a larger buffer.
> The Read(Span<byte> buffer) method is used to read the content of the string
> from the input stream. The method always tries to read as much bytes from the
> input stream as this buffer has length, and in particular will fail with the
> exception shown above when the stream does not have enough data anymore.
> Thus, if the string has expected length > StackallocThreshold and the buffer
> obtained from ArrayPool<byte>.Shared.Rent(length) has size > length, the Read
> method will either throw the above AvroException (when the string is the last
> element in the stream) or will already consume parts of following data items
> in the stream, in any case causing corruption.
> The provided patch turns the byte array returned by the ArrayPool into a Span
> with the correct length using the Splice method, instead of casting it
> implicitly to Span<byte>.
>
> Possiby related:
> [https://github.com/confluentinc/confluent-kafka-dotnet/issues/1398#issuecomment-748171083]
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)