[ 
https://issues.apache.org/jira/browse/AVRO-1499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Martin Kleppmann updated AVRO-1499:
-----------------------------------

    Attachment: AVRO-1499.patch

This is due to Ruby 2.0 changing the default file encoding from US-ASCII to 
UTF-8. When creating the in-memory buffer for a data file block, the Avro code 
was applying the default file encoding to the buffer. Later, when writing the 
block to file, the length of the block would be computed incorrectly in Ruby 
2.0, since it counted the number of UTF-8 characters rather than the number of 
bytes. This caused the written data file to be corrupt.

Fixed by forcing the block buffer to be binary, as it should be. Attaching a 
one-line patch which does this.

No new tests needed, as several existing tests failed in Ruby 2.0. But that 
doesn't help if nobody is running the tests in Ruby 2.0. Created AVRO-1515 to 
discuss that.

> Ruby 2+ Writes Invalid avro files using the avro gem
> ----------------------------------------------------
>
>                 Key: AVRO-1499
>                 URL: https://issues.apache.org/jira/browse/AVRO-1499
>             Project: Avro
>          Issue Type: Bug
>          Components: ruby
>    Affects Versions: 1.7.5
>            Reporter: Michael Ries
>            Assignee: Martin Kleppmann
>              Labels: ruby
>             Fix For: 1.7.7
>
>         Attachments: AVRO-1499.patch
>
>
> The rubygem writes corrupted avro files under ruby 2.0.0 and ruby 2.1.1. It 
> appears to work correctly under jruby-1.7.10 and ruby 1.9.3.
> Here is a reproducible:
> ```ruby
> require 'avro'
>  
> data = [
>   {"guid"=>"144045de-eb44-dd1b-d9af-6c8b5d41a96e", 
> "user_guid"=>"0cd41235-5c14-eae9-00ed-c6eb11dd9119", "name"=>"My Awesome 
> Bank", "created_at"=>1390617818, "updated_at"=>1398180288, "deleted_at"=>nil},
>   {"guid"=>"51e06057-14d2-7527-81fa-b07dba0a263b", 
> "user_guid"=>"0cd41235-5c14-eae9-00ed-c6eb11dd9119", "name"=>"Student Loans 
> R' Us", "created_at"=>1386178342, "updated_at"=>1398180286, 
> "deleted_at"=>nil},
>   {"guid"=>"b4d1d99f-4351-d0e7-221c-a3fae08716bc", 
> "user_guid"=>"0cd41235-5c14-eae9-00ed-c6eb11dd9119", "name"=>"My Awesome 
> Bank", "created_at"=>1390617026, "updated_at"=>1398180288, "deleted_at"=>nil},
>   {"guid"=>"084638fa-a78d-bbdd-e075-7c9c957a9b46", 
> "user_guid"=>"0cd41235-5c14-eae9-00ed-c6eb11dd9119", "name"=>"My Awesome 
> Bank", "created_at"=>1390617138, "updated_at"=>1398180288, "deleted_at"=>nil},
>   {"guid"=>"79287c76-4e8f-0a21-7569-a2bcdc2b2f4d", 
> "user_guid"=>"0cd41235-5c14-eae9-00ed-c6eb11dd9119", "name"=>"My Awesome 
> Bank", "created_at"=>1390617135, "updated_at"=>1398180288, "deleted_at"=>nil},
>   {"guid"=>"3bcc26b2-7d3b-6c4d-cb27-4eb1574b3c20", 
> "user_guid"=>"0cd41235-5c14-eae9-00ed-c6eb11dd9119", "name"=>"Cayman Islands 
> Bank", "created_at"=>1386902345, "updated_at"=>1398180288, "deleted_at"=>nil},
>   {"guid"=>"75e1e56c-7611-4030-d002-afa2af70e5a1", 
> "user_guid"=>"0cd41235-5c14-eae9-00ed-c6eb11dd9119", "name"=>"My Awesome 
> Bank", "created_at"=>1390617427, "updated_at"=>1398180288, "deleted_at"=>nil},
> ]
>  
> member_schema = <<-SCHEMA
> {"namespace": "md.data_logs",
>  "type": "record",
>  "name": "Member",
>  "fields": [
>      {"name": "guid", "type": "string"},
>      {"name": "user_guid", "type": "string"},
>      {"name": "name", "type": ["string","null"]},
>      {"name": "created_at", "type":"long"},
>      {"name": "updated_at", "type":"long"},
>      {"name": "deleted_at", "type":["long","null"]}
>  ]
> }
> SCHEMA
> filepath = "./members.avro"
> File.unlink(filepath) if File.exists?(filepath)
>  
> Avro::DataFile.open(filepath, "w", member_schema) do |dw|
>   data.each do |entry|
>     dw << entry
>   end
> end
>  
>  
> entries = []
> Avro::DataFile.open(filepath, "r") do |reader|
>   reader.each do |entry|
>     entries << entry
>   end
> end
>  
> puts "Here is the data I wrote into the file:"
> data.each{|e| p e }
> print "\n\n\n\n"
>  
> puts "Here is the data I read from the file:"
> entries.each{|e| p e }
> ```
> Under ruby 2+ it fails with the message "undefined method 'unpack' for 
> nil:NilClass (NoMethodError)". I have also tested that the rubygem can 
> correctly read avro files written by the java client, but the java client 
> fails to read files written by the ruby client, so the issue is definitely in 
> how the rubygem is trying to write the binary file.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to