[ 
https://issues.apache.org/jira/browse/AVRO-1499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Martin Kleppmann updated AVRO-1499:
-----------------------------------

    Attachment: AVRO-1499-3.patch

Thanks for your patch, [~wvanbergen]. I've attached a new patch which merges 
our changes together:

- Removing the monkeypatching of Enumerable is a good idea, but it's a separate 
issue, so I've split it out into AVRO-1536.
- Given that we've just removed a monkeypatch on a core module, I'm not so keen 
to add a new one for String#bytesize. I've changed it to check for the presence 
of String#bytesize at the point where it's needed.
- I've retained my change to set the encoding of the buffer to BINARY. That by 
itself is actually sufficient, because String#size returns the number of bytes 
if the encoding is set to binary. However, I've also kept it using #bytesize to 
make clear what's going on.

I've tested it in a broad range of Ruby versions. I'll commit this soon unless 
there are objections.

> Ruby 2+ Writes Invalid avro files using the avro gem
> ----------------------------------------------------
>
>                 Key: AVRO-1499
>                 URL: https://issues.apache.org/jira/browse/AVRO-1499
>             Project: Avro
>          Issue Type: Bug
>          Components: ruby
>    Affects Versions: 1.7.5
>            Reporter: Michael Ries
>            Assignee: Martin Kleppmann
>              Labels: ruby
>             Fix For: 1.7.7
>
>         Attachments: AVRO-1499-2.patch, AVRO-1499-3.patch, AVRO-1499.patch
>
>
> The rubygem writes corrupted avro files under ruby 2.0.0 and ruby 2.1.1. It 
> appears to work correctly under jruby-1.7.10 and ruby 1.9.3.
> Here is a reproducible:
> ```ruby
> require 'avro'
>  
> data = [
>   {"guid"=>"144045de-eb44-dd1b-d9af-6c8b5d41a96e", 
> "user_guid"=>"0cd41235-5c14-eae9-00ed-c6eb11dd9119", "name"=>"My Awesome 
> Bank", "created_at"=>1390617818, "updated_at"=>1398180288, "deleted_at"=>nil},
>   {"guid"=>"51e06057-14d2-7527-81fa-b07dba0a263b", 
> "user_guid"=>"0cd41235-5c14-eae9-00ed-c6eb11dd9119", "name"=>"Student Loans 
> R' Us", "created_at"=>1386178342, "updated_at"=>1398180286, 
> "deleted_at"=>nil},
>   {"guid"=>"b4d1d99f-4351-d0e7-221c-a3fae08716bc", 
> "user_guid"=>"0cd41235-5c14-eae9-00ed-c6eb11dd9119", "name"=>"My Awesome 
> Bank", "created_at"=>1390617026, "updated_at"=>1398180288, "deleted_at"=>nil},
>   {"guid"=>"084638fa-a78d-bbdd-e075-7c9c957a9b46", 
> "user_guid"=>"0cd41235-5c14-eae9-00ed-c6eb11dd9119", "name"=>"My Awesome 
> Bank", "created_at"=>1390617138, "updated_at"=>1398180288, "deleted_at"=>nil},
>   {"guid"=>"79287c76-4e8f-0a21-7569-a2bcdc2b2f4d", 
> "user_guid"=>"0cd41235-5c14-eae9-00ed-c6eb11dd9119", "name"=>"My Awesome 
> Bank", "created_at"=>1390617135, "updated_at"=>1398180288, "deleted_at"=>nil},
>   {"guid"=>"3bcc26b2-7d3b-6c4d-cb27-4eb1574b3c20", 
> "user_guid"=>"0cd41235-5c14-eae9-00ed-c6eb11dd9119", "name"=>"Cayman Islands 
> Bank", "created_at"=>1386902345, "updated_at"=>1398180288, "deleted_at"=>nil},
>   {"guid"=>"75e1e56c-7611-4030-d002-afa2af70e5a1", 
> "user_guid"=>"0cd41235-5c14-eae9-00ed-c6eb11dd9119", "name"=>"My Awesome 
> Bank", "created_at"=>1390617427, "updated_at"=>1398180288, "deleted_at"=>nil},
> ]
>  
> member_schema = <<-SCHEMA
> {"namespace": "md.data_logs",
>  "type": "record",
>  "name": "Member",
>  "fields": [
>      {"name": "guid", "type": "string"},
>      {"name": "user_guid", "type": "string"},
>      {"name": "name", "type": ["string","null"]},
>      {"name": "created_at", "type":"long"},
>      {"name": "updated_at", "type":"long"},
>      {"name": "deleted_at", "type":["long","null"]}
>  ]
> }
> SCHEMA
> filepath = "./members.avro"
> File.unlink(filepath) if File.exists?(filepath)
>  
> Avro::DataFile.open(filepath, "w", member_schema) do |dw|
>   data.each do |entry|
>     dw << entry
>   end
> end
>  
>  
> entries = []
> Avro::DataFile.open(filepath, "r") do |reader|
>   reader.each do |entry|
>     entries << entry
>   end
> end
>  
> puts "Here is the data I wrote into the file:"
> data.each{|e| p e }
> print "\n\n\n\n"
>  
> puts "Here is the data I read from the file:"
> entries.each{|e| p e }
> ```
> Under ruby 2+ it fails with the message "undefined method 'unpack' for 
> nil:NilClass (NoMethodError)". I have also tested that the rubygem can 
> correctly read avro files written by the java client, but the java client 
> fails to read files written by the ruby client, so the issue is definitely in 
> how the rubygem is trying to write the binary file.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to