[ 
https://issues.apache.org/jira/browse/AVRO-1134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13495425#comment-13495425
 ] 

Nicolas Fouché commented on AVRO-1134:
--------------------------------------

Here is my fix: 
https://github.com/nfo/avro/commit/f692322f40a3e0ce2a948d14bd1359694c9125f1 . 
Note that it's not compatible with Ruby 1.8.6.

{code:title=icantbelievethishasntbeenfixedalready.diff|borderStyle=solid}
diff --git a/lang/ruby/lib/avro/io.rb b/lang/ruby/lib/avro/io.rb
index b548b46..dbbd971 100644
--- a/lang/ruby/lib/avro/io.rb
+++ b/lang/ruby/lib/avro/io.rb
@@ -201,7 +201,7 @@ module Avro
 
       # Bytes are encoded as a long followed by that many bytes of data.
       def write_bytes(datum)
-        write_long(datum.size)
+        write_long(datum.bytesize)
         @writer.write(datum)
       end
{code}

It just fixes the writing, not the reading phase. When reading, it's the goal 
of the client to tell that strings are encoded in UTF-8. In Ruby 1.9+:

{code:borderStyle=solid}
string.force_encoding('UTF-8')
{code}
                
> Ruby datafile serialization fails with UTF-8 characters
> -------------------------------------------------------
>
>                 Key: AVRO-1134
>                 URL: https://issues.apache.org/jira/browse/AVRO-1134
>             Project: Avro
>          Issue Type: Bug
>          Components: ruby
>    Affects Versions: 1.7.1
>         Environment: Linux and Mac OS X tested, identical on both.
>            Reporter: Paul Dlug
>         Attachments: avro_utf8_test.rb
>
>
> When trying to deserialize a data file containing a string with UTF-8 
> characters the ruby avro client fails with a variety of errors (error message 
> varies with each run, see below). The attached script can be used to 
> replicate this problem. Changing the type in the schema between bytes and 
> string doesn't make a difference.
> {code}
> % ruby avro_utf8_test.rb
> {"id"=>"works", "data"=>"2x2"}
> {"id"=>"broken", "data"=>"2\xC3\x97"}
> vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:46:in `byte!': undefined 
> method `unpack' for nil:NilClass (NoMethodError)
>   from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:63:in `read_long'
>   from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:380:in `read_union'
>   from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:316:in `read_data'
>   from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:391:in `block in 
> read_record'
>   from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:389:in `each'
>   from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:389:in `read_record'
>   from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:317:in `read_data'
>   from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:282:in `read'
>   from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/data_file.rb:223:in `block 
> in each'
>   from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/data_file.rb:211:in `loop'
>   from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/data_file.rb:211:in `each'
>   from avro_utf8_test.rb:29:in `<main>'
> % ruby avro_utf8_test.rb
> {"id"=>"works", "data"=>"2x2"}
> {"id"=>"broken", "data"=>"2\xC3\x97"}
> vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:46:in `byte!': undefined 
> method `unpack' for nil:NilClass (NoMethodError)
>   from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:63:in `read_long'
>   from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:380:in `read_union'
>   from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:316:in `read_data'
>   from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:391:in `block in 
> read_record'
>   from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:389:in `each'
>   from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:389:in `read_record'
>   from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:317:in `read_data'
>   from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:282:in `read'
>   from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/data_file.rb:223:in `block 
> in each'
>   from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/data_file.rb:211:in `loop'
>   from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/data_file.rb:211:in `each'
>   from avro_utf8_test.rb:29:in `<main>'
> % ruby avro_utf8_test.rb
> {"id"=>"works", "data"=>"2x2"}
> {"id"=>"broken", "data"=>"2\xC3\x97"}
> vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:46:in `byte!': undefined 
> method `unpack' for nil:NilClass (NoMethodError)
>   from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:63:in `read_long'
>   from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:380:in `read_union'
>   from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:316:in `read_data'
>   from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:391:in `block in 
> read_record'
>   from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:389:in `each'
>   from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:389:in `read_record'
>   from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:317:in `read_data'
>   from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:282:in `read'
>   from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/data_file.rb:223:in `block 
> in each'
>   from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/data_file.rb:211:in `loop'
>   from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/data_file.rb:211:in `each'
>   from avro_utf8_test.rb:29:in `<main>'
> % ruby avro_utf8_test.rb
> {"id"=>"works", "data"=>"2x2"}
> {"id"=>"broken", "data"=>"2\xC3\x97"}
> vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:105:in `read': failed to 
> allocate memory (NoMemoryError)
>   from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:105:in `read'
>   from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:93:in `read_bytes'
>   from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:100:in `read_string'
>   from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:306:in `read_data'
>   from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:391:in `block in 
> read_record'
>   from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:389:in `each'
>   from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:389:in `read_record'
>   from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:317:in `read_data'
>   from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:282:in `read'
>   from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/data_file.rb:223:in `block 
> in each'
>   from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/data_file.rb:211:in `loop'
>   from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/data_file.rb:211:in `each'
>   from avro_utf8_test.rb:29:in `<main>'
> % ruby avro_utf8_test.rb
> {"id"=>"works", "data"=>"2x2"}
> {"id"=>"broken", "data"=>"2\xC3\x97"}
> vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:229:in `match_schemas': 
> undefined method `type' for nil:NilClass (NoMethodError)
>   from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:287:in `read_data'
>   from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:383:in `read_union'
>   from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:316:in `read_data'
>   from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:391:in `block in 
> read_record'
>   from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:389:in `each'
>   from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:389:in `read_record'
>   from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:317:in `read_data'
>   from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:282:in `read'
>   from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/data_file.rb:223:in `block 
> in each'
>   from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/data_file.rb:211:in `loop'
>   from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/data_file.rb:211:in `each'
>   from avro_utf8_test.rb:29:in `<main>'
> % ruby avro_utf8_test.rb
> {"id"=>"works", "data"=>"2x2"}
> {"id"=>"broken", "data"=>"2\xC3\x97"}
> vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:105:in `read': negative 
> length -7638 given (ArgumentError)
>   from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:105:in `read'
>   from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:93:in `read_bytes'
>   from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:100:in `read_string'
>   from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:306:in `read_data'
>   from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:391:in `block in 
> read_record'
>   from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:389:in `each'
>   from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:389:in `read_record'
>   from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:317:in `read_data'
>   from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:282:in `read'
>   from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/data_file.rb:223:in `block 
> in each'
>   from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/data_file.rb:211:in `loop'
>   from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/data_file.rb:211:in `each'
>   from avro_utf8_test.rb:29:in `<main>'
> % ruby avro_utf8_test.rb
> {"id"=>"works", "data"=>"2x2"}
> {"id"=>"broken", "data"=>"2\xC3\x97"}
> vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:105:in `read': negative 
> length -50 given (ArgumentError)
>   from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:105:in `read'
>   from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:93:in `read_bytes'
>   from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:100:in `read_string'
>   from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:306:in `read_data'
>   from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:391:in `block in 
> read_record'
>   from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:389:in `each'
>   from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:389:in `read_record'
>   from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:317:in `read_data'
>   from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:282:in `read'
>   from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/data_file.rb:223:in `block 
> in each'
>   from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/data_file.rb:211:in `loop'
>   from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/data_file.rb:211:in `each'
>   from avro_utf8_test.rb:29:in `<main>'
> % ruby avro_utf8_test.rb
> {"id"=>"works", "data"=>"2x2"}
> {"id"=>"broken", "data"=>"2\xC3\x97"}
> vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:46:in `byte!': undefined 
> method `unpack' for nil:NilClass (NoMethodError)
>   from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:63:in `read_long'
>   from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:380:in `read_union'
>   from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:316:in `read_data'
>   from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:391:in `block in 
> read_record'
>   from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:389:in `each'
>   from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:389:in `read_record'
>   from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:317:in `read_data'
>   from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:282:in `read'
>   from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/data_file.rb:223:in `block 
> in each'
>   from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/data_file.rb:211:in `loop'
>   from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/data_file.rb:211:in `each'
>   from avro_utf8_test.rb:29:in `<main>'
> % ruby avro_utf8_test.rb
> {"id"=>"works", "data"=>"2x2"}
> {"id"=>"broken", "data"=>"2\xC3\x97"}
> vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:105:in `read': negative 
> length -47 given (ArgumentError)
>   from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:105:in `read'
>   from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:93:in `read_bytes'
>   from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:100:in `read_string'
>   from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:306:in `read_data'
>   from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:391:in `block in 
> read_record'
>   from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:389:in `each'
>   from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:389:in `read_record'
>   from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:317:in `read_data'
>   from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:282:in `read'
>   from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/data_file.rb:223:in `block 
> in each'
>   from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/data_file.rb:211:in `loop'
>   from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/data_file.rb:211:in `each'
>   from avro_utf8_test.rb:29:in `<main>'
> % ruby avro_utf8_test.rb
> {"id"=>"works", "data"=>"2x2"}
> {"id"=>"broken", "data"=>"2\xC3\x97"}
> vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:229:in `match_schemas': 
> undefined method `type' for nil:NilClass (NoMethodError)
>   from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:287:in `read_data'
>   from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:383:in `read_union'
>   from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:316:in `read_data'
>   from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:391:in `block in 
> read_record'
>   from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:389:in `each'
>   from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:389:in `read_record'
>   from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:317:in `read_data'
>   from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/io.rb:282:in `read'
>   from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/data_file.rb:223:in `block 
> in each'
>   from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/data_file.rb:211:in `loop'
>   from vendor/ruby/1.9.1/gems/avro-1.7.0/lib/avro/data_file.rb:211:in `each'
>   from avro_utf8_test.rb:29:in `<main>'
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to