[ 
https://issues.apache.org/jira/browse/AVRO-3005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan Skraba updated AVRO-3005:
------------------------------
    Resolution: Duplicate
        Status: Resolved  (was: Patch Available)

> Deserialization of string with > 256 characters fails
> -----------------------------------------------------
>
>                 Key: AVRO-3005
>                 URL: https://issues.apache.org/jira/browse/AVRO-3005
>             Project: Apache Avro
>          Issue Type: Bug
>          Components: csharp
>    Affects Versions: 1.10.1
>            Reporter: Lucas Heimberg
>            Priority: Major
>         Attachments: AVRO-3005.patch
>
>
> Avro.IO.BinaryDecoder.ReadString() fails for strings with length > 256, i.e. 
> when the StackallocThreshold is exceeded. 
> This can be seen when serializing and subsequently deserializing a 
> GenericRecord of schema 
> {code:java}
> {
>   "type": "record",
>   "name": "Foo",
>   "fields": [
>     { "name": "x", "type": "string" }
>   ]
> }{code}
> with a field x containing a string of length > 256, as done in the test case 
> Test(257):
> {code:java}
> public void Test(int n)
> {
>     var schema = (RecordSchema) Schema.Parse("{ \"type\":\"record\", 
> \"name\":\"Foo\",\"fields\":[{\"name\":\"x\",\"type\":\"string\"}]}");
>             
>     var datum = new GenericRecord(schema);            
>     datum.Add("x", new String('x', n));
>     byte[] serialized;
>     using (var ms = new MemoryStream())
>     {
>         var enc = new BinaryEncoder(ms);
>         var writer = new GenericDatumWriter<GenericRecord>(schema);
>         writer.Write(datum, enc);                
>         serialized = ms.ToArray();
>     }
>     using (var ms = new MemoryStream(serialized))
>     {
>         var dec = new BinaryDecoder(ms);
>         var deserialized = new GenericRecord(schema);
>         var reader = new GenericDatumReader<GenericRecord>(schema, schema);
>         reader.Read(deserialized, dec);
>         Assert.Equal(datum, deserialized);
>     }
> }{code}
> which yields the following exception
> {code:java}
> Avro.AvroException
> End of stream reached
>    at Avro.IO.BinaryDecoder.Read(Span`1 buffer)
>    at Avro.IO.BinaryDecoder.ReadString()
>    at 
> Avro.Generic.PreresolvingDatumReader`1.<>c.<ResolveReader>b__21_1(Decoder d)
>    at 
> Avro.Generic.PreresolvingDatumReader`1.<>c__DisplayClass37_0.<Read>b__0(Object
>  r, Decoder d)
>    at 
> Avro.Generic.PreresolvingDatumReader`1.<>c__DisplayClass23_1.<ResolveRecord>b__2(Object
>  rec, Decoder d)
>    at Avro.Generic.PreresolvingDatumReader`1.ReadRecord(Object reuse, Decoder 
> decoder, RecordAccess recordAccess, IEnumerable`1 readSteps)
>    at 
> Avro.Generic.PreresolvingDatumReader`1.<>c__DisplayClass23_0.<ResolveRecord>b__0(Object
>  r, Decoder d)
>    at Avro.Generic.PreresolvingDatumReader`1.Read(T reuse, Decoder decoder)
>    at AvroTests.AvroTests.Test(Int32 n) in 
> C:\Users\l.heimberg\Source\Repos\AvroTests\AvroTests\AvroTests.cs:line 41
> {code}
> The reason seems to be the following: when a string of length <= 
> StackallocThreshold (=256) is read, a buffer, to read the content of the 
> string from the stream into, is allocated on the stack with the exact length 
> of the string. If the length is > StackallocThreshold, the buffer is obtained 
> from ArrayPool<byte>.Shared.Rent(length), which returns a buffer of *minimum* 
> length 'length', but possibly also a larger buffer.
> The Read(Span<byte> buffer) method is used to read the content of the string 
> from the input stream. The method always tries to read as much bytes from the 
> input stream as this buffer has length, and in particular will fail with the 
> exception shown above when the stream does not have enough data anymore. 
> Thus, if the string has expected length > StackallocThreshold and the buffer 
> obtained from ArrayPool<byte>.Shared.Rent(length) has size > length, the Read 
> method will either throw the above AvroException (when the string is the last 
> element in the stream) or will already consume parts of following data items 
> in the stream, in any case causing corruption.
> The provided patch turns the byte array returned by the ArrayPool into a Span 
> with the correct length using the Splice method, instead of casting it 
> implicitly to Span<byte>.
>  
> Possiby related: 
> [https://github.com/confluentinc/confluent-kafka-dotnet/issues/1398#issuecomment-748171083]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to