[
https://issues.apache.org/jira/browse/AVRO-1499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14047936#comment-14047936
]
Willem van Bergen commented on AVRO-1499:
-----------------------------------------
I think it fails because the library uses `size` instead of `bytesize`. In Ruby
1.9+, size returns the number of characters, not the number of bytes in a
string. Which means that in a unicode string, the length of a string that gets
written is too short.
I will attach a patch that aliases `bytesize` to `size` in Ruby 1.8, and uses
bytesize.
> Ruby 2+ Writes Invalid avro files using the avro gem
> ----------------------------------------------------
>
> Key: AVRO-1499
> URL: https://issues.apache.org/jira/browse/AVRO-1499
> Project: Avro
> Issue Type: Bug
> Components: ruby
> Affects Versions: 1.7.5
> Reporter: Michael Ries
> Assignee: Martin Kleppmann
> Labels: ruby
> Fix For: 1.7.7
>
> Attachments: AVRO-1499.patch
>
>
> The rubygem writes corrupted avro files under ruby 2.0.0 and ruby 2.1.1. It
> appears to work correctly under jruby-1.7.10 and ruby 1.9.3.
> Here is a reproducible:
> ```ruby
> require 'avro'
>
> data = [
> {"guid"=>"144045de-eb44-dd1b-d9af-6c8b5d41a96e",
> "user_guid"=>"0cd41235-5c14-eae9-00ed-c6eb11dd9119", "name"=>"My Awesome
> Bank", "created_at"=>1390617818, "updated_at"=>1398180288, "deleted_at"=>nil},
> {"guid"=>"51e06057-14d2-7527-81fa-b07dba0a263b",
> "user_guid"=>"0cd41235-5c14-eae9-00ed-c6eb11dd9119", "name"=>"Student Loans
> R' Us", "created_at"=>1386178342, "updated_at"=>1398180286,
> "deleted_at"=>nil},
> {"guid"=>"b4d1d99f-4351-d0e7-221c-a3fae08716bc",
> "user_guid"=>"0cd41235-5c14-eae9-00ed-c6eb11dd9119", "name"=>"My Awesome
> Bank", "created_at"=>1390617026, "updated_at"=>1398180288, "deleted_at"=>nil},
> {"guid"=>"084638fa-a78d-bbdd-e075-7c9c957a9b46",
> "user_guid"=>"0cd41235-5c14-eae9-00ed-c6eb11dd9119", "name"=>"My Awesome
> Bank", "created_at"=>1390617138, "updated_at"=>1398180288, "deleted_at"=>nil},
> {"guid"=>"79287c76-4e8f-0a21-7569-a2bcdc2b2f4d",
> "user_guid"=>"0cd41235-5c14-eae9-00ed-c6eb11dd9119", "name"=>"My Awesome
> Bank", "created_at"=>1390617135, "updated_at"=>1398180288, "deleted_at"=>nil},
> {"guid"=>"3bcc26b2-7d3b-6c4d-cb27-4eb1574b3c20",
> "user_guid"=>"0cd41235-5c14-eae9-00ed-c6eb11dd9119", "name"=>"Cayman Islands
> Bank", "created_at"=>1386902345, "updated_at"=>1398180288, "deleted_at"=>nil},
> {"guid"=>"75e1e56c-7611-4030-d002-afa2af70e5a1",
> "user_guid"=>"0cd41235-5c14-eae9-00ed-c6eb11dd9119", "name"=>"My Awesome
> Bank", "created_at"=>1390617427, "updated_at"=>1398180288, "deleted_at"=>nil},
> ]
>
> member_schema = <<-SCHEMA
> {"namespace": "md.data_logs",
> "type": "record",
> "name": "Member",
> "fields": [
> {"name": "guid", "type": "string"},
> {"name": "user_guid", "type": "string"},
> {"name": "name", "type": ["string","null"]},
> {"name": "created_at", "type":"long"},
> {"name": "updated_at", "type":"long"},
> {"name": "deleted_at", "type":["long","null"]}
> ]
> }
> SCHEMA
> filepath = "./members.avro"
> File.unlink(filepath) if File.exists?(filepath)
>
> Avro::DataFile.open(filepath, "w", member_schema) do |dw|
> data.each do |entry|
> dw << entry
> end
> end
>
>
> entries = []
> Avro::DataFile.open(filepath, "r") do |reader|
> reader.each do |entry|
> entries << entry
> end
> end
>
> puts "Here is the data I wrote into the file:"
> data.each{|e| p e }
> print "\n\n\n\n"
>
> puts "Here is the data I read from the file:"
> entries.each{|e| p e }
> ```
> Under ruby 2+ it fails with the message "undefined method 'unpack' for
> nil:NilClass (NoMethodError)". I have also tested that the rubygem can
> correctly read avro files written by the java client, but the java client
> fails to read files written by the ruby client, so the issue is definitely in
> how the rubygem is trying to write the binary file.
--
This message was sent by Atlassian JIRA
(v6.2#6252)