[
https://issues.apache.org/jira/browse/AVRO-1637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jackie Murphy resolved AVRO-1637.
---------------------------------
Resolution: Invalid
Closing as invalid, I can only reproduce on avro 1.6.x, but not 1.7.x. Our
internal gem server apparently was not mirroring rubygems and I didn't see that
we were several versions behind the latest and greatest. Apologies.
> Handling multibyte UTF-8 characters in Ruby
> -------------------------------------------
>
> Key: AVRO-1637
> URL: https://issues.apache.org/jira/browse/AVRO-1637
> Project: Avro
> Issue Type: Bug
> Reporter: Jackie Murphy
> Priority: Minor
>
> It looks like the Ruby implementation of Avro doesn't successfully round-trip
> UTF-8 encoded strings containing multibyte characters.
> Example:
> {code}
> require 'avro'
> def serialize(obj, schema)
> buffer = StringIO.new
> encoder = Avro::IO::BinaryEncoder.new(buffer)
> datum_writer = Avro::IO::DatumWriter.new(schema)
> datum_writer.write(obj, encoder)
> buffer.seek(0)
> buffer.read
> end
> def deserialize(avro_obj, schema)
> reader = StringIO.new(avro_obj)
> decoder = Avro::IO::BinaryDecoder.new(reader)
> datum_reader = Avro::IO::DatumReader.new(schema)
> datum_reader.read(decoder)
> end
> {code}
> {code}
> > schema =
> > Avro::Schema.parse("{\"type\":\"record\",\"name\":\"Example\",\"fields\":[{\"name\":\"example_field\",\"type\":\"string\"},
> > {\"name\":\"other_field\",\"type\":\"string\"}]}")
> > deserialize(serialize({'example_field'=> 'héllö world',
> > 'other_field'=>'goodbye world'}, schema), schema)
> {"example_field"=>"h\xC3\xA9ll\xC3\xB6 wor", "other_field"=>"d\x1Agoodbye
> world"}
> {code}
> Note that it looks like it's computing the length of the first field
> incorrectly (length of string in characters rather than in bytes?), and the
> end of the first field spills into the second field.
> Also, if the bytes happen to be especially unlucky in how they line up, we
> can get an {{ArgumentError}}
> {code}
> > deserialize(serialize({'example_field'=> '‘hello’ world',
> > 'other_field'=>'goodbye world'}, schema), schema)
> ArgumentError: negative length -56 given
> {code}
> This looks similar to a previous issue with the Perl implementation in
> AVRO-1517
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)