[ https://issues.apache.org/jira/browse/AVRO-1134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13495425#comment-13495425 ]
Nicolas Fouché commented on AVRO-1134: -------------------------------------- Here is my fix: https://github.com/nfo/avro/commit/f692322f40a3e0ce2a948d14bd1359694c9125f1 . Note that it's not compatible with Ruby 1.8.6. {code:title=icantbelievethishasntbeenfixedalready.diff|borderStyle=solid} diff --git a/lang/ruby/lib/avro/io.rb b/lang/ruby/lib/avro/io.rb index b548b46..dbbd971 100644 --- a/lang/ruby/lib/avro/io.rb +++ b/lang/ruby/lib/avro/io.rb @@ -201,7 +201,7 @@ module Avro # Bytes are encoded as a long followed by that many bytes of data. def write_bytes(datum) - write_long(datum.size) + write_long(datum.bytesize) @writer.write(datum) end {code} It just fixes the writing, not the reading phase. When reading, it's the goal of the client to tell that strings are encoded in UTF-8. In Ruby 1.9+: {code:borderStyle=solid} string.force_encoding('UTF-8') {code} > Ruby datafile serialization fails with UTF-8 characters > ------------------------------------------------------- > > Key: AVRO-1134 > URL: https://issues.apache.org/jira/browse/AVRO-1134 > Project: Avro > Issue Type: Bug > Components: ruby > Affects Versions: 1.7.1 > Environment: Linux and Mac OS X tested, identical on both. > Reporter: Paul Dlug > Attachments: avro_utf8_test.rb > > > When trying to deserialize a data file containing a string with UTF-8 > characters the ruby avro client fails with a variety of errors (error message > varies with each run, see below). The attached script can be used to > replicate this problem. Changing the type in the schema between bytes and > string doesn't make a difference. > {code} > % ruby avro_utf8_test.rb > {"id"=>"works", "data"=>"2x2"} > {"id"=>"broken", "data"=>"2\xC3\x97"} > vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:46:in `byte!': undefined > method `unpack' for nil:NilClass (NoMethodError) > from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:63:in `read_long' > from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:380:in `read_union' > from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:316:in `read_data' > from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:391:in `block in > read_record' > from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:389:in `each' > from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:389:in `read_record' > from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:317:in `read_data' > from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:282:in `read' > from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/data_file.rb:223:in `block > in each' > from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/data_file.rb:211:in `loop' > from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/data_file.rb:211:in `each' > from avro_utf8_test.rb:29:in `<main>' > % ruby avro_utf8_test.rb > {"id"=>"works", "data"=>"2x2"} > {"id"=>"broken", "data"=>"2\xC3\x97"} > vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:46:in `byte!': undefined > method `unpack' for nil:NilClass (NoMethodError) > from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:63:in `read_long' > from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:380:in `read_union' > from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:316:in `read_data' > from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:391:in `block in > read_record' > from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:389:in `each' > from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:389:in `read_record' > from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:317:in `read_data' > from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:282:in `read' > from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/data_file.rb:223:in `block > in each' > from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/data_file.rb:211:in `loop' > from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/data_file.rb:211:in `each' > from avro_utf8_test.rb:29:in `<main>' > % ruby avro_utf8_test.rb > {"id"=>"works", "data"=>"2x2"} > {"id"=>"broken", "data"=>"2\xC3\x97"} > vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:46:in `byte!': undefined > method `unpack' for nil:NilClass (NoMethodError) > from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:63:in `read_long' > from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:380:in `read_union' > from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:316:in `read_data' > from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:391:in `block in > read_record' > from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:389:in `each' > from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:389:in `read_record' > from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:317:in `read_data' > from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:282:in `read' > from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/data_file.rb:223:in `block > in each' > from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/data_file.rb:211:in `loop' > from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/data_file.rb:211:in `each' > from avro_utf8_test.rb:29:in `<main>' > % ruby avro_utf8_test.rb > {"id"=>"works", "data"=>"2x2"} > {"id"=>"broken", "data"=>"2\xC3\x97"} > vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:105:in `read': failed to > allocate memory (NoMemoryError) > from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:105:in `read' > from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:93:in `read_bytes' > from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:100:in `read_string' > from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:306:in `read_data' > from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:391:in `block in > read_record' > from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:389:in `each' > from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:389:in `read_record' > from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:317:in `read_data' > from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:282:in `read' > from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/data_file.rb:223:in `block > in each' > from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/data_file.rb:211:in `loop' > from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/data_file.rb:211:in `each' > from avro_utf8_test.rb:29:in `<main>' > % ruby avro_utf8_test.rb > {"id"=>"works", "data"=>"2x2"} > {"id"=>"broken", "data"=>"2\xC3\x97"} > vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:229:in `match_schemas': > undefined method `type' for nil:NilClass (NoMethodError) > from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:287:in `read_data' > from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:383:in `read_union' > from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:316:in `read_data' > from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:391:in `block in > read_record' > from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:389:in `each' > from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:389:in `read_record' > from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:317:in `read_data' > from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:282:in `read' > from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/data_file.rb:223:in `block > in each' > from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/data_file.rb:211:in `loop' > from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/data_file.rb:211:in `each' > from avro_utf8_test.rb:29:in `<main>' > % ruby avro_utf8_test.rb > {"id"=>"works", "data"=>"2x2"} > {"id"=>"broken", "data"=>"2\xC3\x97"} > vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:105:in `read': negative > length -7638 given (ArgumentError) > from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:105:in `read' > from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:93:in `read_bytes' > from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:100:in `read_string' > from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:306:in `read_data' > from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:391:in `block in > read_record' > from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:389:in `each' > from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:389:in `read_record' > from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:317:in `read_data' > from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:282:in `read' > from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/data_file.rb:223:in `block > in each' > from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/data_file.rb:211:in `loop' > from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/data_file.rb:211:in `each' > from avro_utf8_test.rb:29:in `<main>' > % ruby avro_utf8_test.rb > {"id"=>"works", "data"=>"2x2"} > {"id"=>"broken", "data"=>"2\xC3\x97"} > vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:105:in `read': negative > length -50 given (ArgumentError) > from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:105:in `read' > from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:93:in `read_bytes' > from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:100:in `read_string' > from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:306:in `read_data' > from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:391:in `block in > read_record' > from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:389:in `each' > from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:389:in `read_record' > from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:317:in `read_data' > from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:282:in `read' > from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/data_file.rb:223:in `block > in each' > from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/data_file.rb:211:in `loop' > from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/data_file.rb:211:in `each' > from avro_utf8_test.rb:29:in `<main>' > % ruby avro_utf8_test.rb > {"id"=>"works", "data"=>"2x2"} > {"id"=>"broken", "data"=>"2\xC3\x97"} > vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:46:in `byte!': undefined > method `unpack' for nil:NilClass (NoMethodError) > from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:63:in `read_long' > from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:380:in `read_union' > from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:316:in `read_data' > from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:391:in `block in > read_record' > from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:389:in `each' > from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:389:in `read_record' > from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:317:in `read_data' > from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:282:in `read' > from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/data_file.rb:223:in `block > in each' > from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/data_file.rb:211:in `loop' > from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/data_file.rb:211:in `each' > from avro_utf8_test.rb:29:in `<main>' > % ruby avro_utf8_test.rb > {"id"=>"works", "data"=>"2x2"} > {"id"=>"broken", "data"=>"2\xC3\x97"} > vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:105:in `read': negative > length -47 given (ArgumentError) > from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:105:in `read' > from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:93:in `read_bytes' > from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:100:in `read_string' > from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:306:in `read_data' > from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:391:in `block in > read_record' > from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:389:in `each' > from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:389:in `read_record' > from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:317:in `read_data' > from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:282:in `read' > from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/data_file.rb:223:in `block > in each' > from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/data_file.rb:211:in `loop' > from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/data_file.rb:211:in `each' > from avro_utf8_test.rb:29:in `<main>' > % ruby avro_utf8_test.rb > {"id"=>"works", "data"=>"2x2"} > {"id"=>"broken", "data"=>"2\xC3\x97"} > vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:229:in `match_schemas': > undefined method `type' for nil:NilClass (NoMethodError) > from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:287:in `read_data' > from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:383:in `read_union' > from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:316:in `read_data' > from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:391:in `block in > read_record' > from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:389:in `each' > from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:389:in `read_record' > from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:317:in `read_data' > from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:282:in `read' > from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/data_file.rb:223:in `block > in each' > from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/data_file.rb:211:in `loop' > from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/data_file.rb:211:in `each' > from avro_utf8_test.rb:29:in `<main>' > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira