[ https://issues.apache.org/jira/browse/AVRO-1206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13502001#comment-13502001 ]
Tophe Vigny commented on AVRO-1206: ----------------------------------- hi Doug, you are using ruby 1.8.x (oldest branch), try with ruby > 1.9.x (official branch), you can use rvm (ruby version manager) to install multiple ruby version. Tophe@info3:~/work/svn_1/trunk/lang/ruby$ rvm use 1.8.7 Using /home/Tophe/.rvm/gems/ruby-1.8.7-p371 Tophe@info3:~/work/svn_1/trunk/lang/ruby$ rake test /home/Tophe/work/svn_1/trunk/lang/ruby/Rakefile:19: warning: already initialized constant VERSION /home/Tophe/.rvm/rubies/ruby-1.8.7-p371/bin/ruby -I"lib:ext:bin:test" -I"/home/Tophe/.rvm/gems/ruby-1.8.7-p371@global/gems/rake-10.0.2/lib" "/home/Tophe/.rvm/gems/ruby-1.8.7-p371@global/gems/rake-10.0.2/lib/rake/rake_test_loader.rb" "test/test_socket_transport.rb" "test/test_io.rb" "test/test_datafile.rb" "test/test_help.rb" "test/test_protocol.rb" Loaded suite /home/Tophe/.rvm/gems/ruby-1.8.7-p371@global/gems/rake-10.0.2/lib/rake/rake_test_loader Started ................................ Finished in 0.536805 seconds. 32 tests, 710 assertions, 0 failures, 0 errors Tophe@info3:~/work/svn_1/trunk/lang/ruby$ rvm use 1.9.3 Using /home/Tophe/.rvm/gems/ruby-1.9.3-p327 Tophe@info3:~/work/svn_1/trunk/lang/ruby$ rake test /home/Tophe/.rvm/rubies/ruby-1.9.3-p327/bin/ruby -I"lib:ext:bin:test" -I"/home/Tophe/.rvm/gems/ruby-1.9.3-p327@global/gems/rake-10.0.2/lib" "/home/Tophe/.rvm/gems/ruby-1.9.3-p327@global/gems/rake-10.0.2/lib/rake/rake_test_loader.rb" "test/test_socket_transport.rb" "test/test_io.rb" "test/test_datafile.rb" "test/test_help.rb" "test/test_protocol.rb" Run options: # Running tests: ...F............................ Finished tests in 0.212220s, 150.7870 tests/s, 3345.5875 assertions/s. 1) Failure: test_utf8(TestDataFile) [/home/Tophe/work/svn_1/trunk/lang/ruby/test/test_datafile.rb:152]: <"家"> expected but was <"\xE5\xAE\xB6">. 32 tests, 710 assertions, 1 failures, 0 errors, 0 skips rake aborted! apply that modif : Index: test/test_datafile.rb =================================================================== --- test/test_datafile.rb (revision 1410649) +++ test/test_datafile.rb (working copy) @@ -1,3 +1,4 @@ +# -*- coding: utf-8 -*- # Licensed to the Apache Software Foundation (ASF) under one # or more contributor license agreements. See the NOTICE file # distributed with this work for additional information @@ -140,4 +141,17 @@ assert_equal(block_count+1, dw.block_count) end end + def test_utf8 + datafile = Avro::DataFile::open('data.avr', 'w', '"string"') + datafile << "家" + datafile.close + + datafile = Avro::DataFile.open('data.avr') + datafile.each do |s| + (rmaj,rmin,rlast) = RUBY_VERSION.split(".").map {|a| a.to_i} + if rmaj <2 && rmin < 9 + assert_equal "家", s + else + assert_equal "家", s.force_encoding('UTF-8') + end + end + datafile.close + end + end Tophe@info3:~/work/svn_1/trunk/lang/ruby$ rake test /home/Tophe/.rvm/rubies/ruby-1.9.3-p327/bin/ruby -I"lib:ext:bin:test" -I"/home/Tophe/.rvm/gems/ruby-1.9.3-p327@global/gems/rake-10.0.2/lib" "/home/Tophe/.rvm/gems/ruby-1.9.3-p327@global/gems/rake-10.0.2/lib/rake/rake_test_loader.rb" "test/test_socket_transport.rb" "test/test_io.rb" "test/test_datafile.rb" "test/test_help.rb" "test/test_protocol.rb" Run options: # Running tests: ................................ Finished tests in 0.166176s, 192.5669 tests/s, 4272.5791 assertions/s. 32 tests, 710 assertions, 0 failures, 0 errors, 0 skips and now change def write_bytes(datum) write_long(datum.size) @writer.write(datum) end and run test in 1.9.3 Tophe@info3:~/work/svn_1/trunk/lang/ruby$ rake test /home/Tophe/.rvm/rubies/ruby-1.9.3-p327/bin/ruby -I"lib:ext:bin:test" -I"/home/Tophe/.rvm/gems/ruby-1.9.3-p327@global/gems/rake-10.0.2/lib" "/home/Tophe/.rvm/gems/ruby-1.9.3-p327@global/gems/rake-10.0.2/lib/rake/rake_test_loader.rb" "test/test_socket_transport.rb" "test/test_io.rb" "test/test_datafile.rb" "test/test_help.rb" "test/test_protocol.rb" Run options: # Running tests: ...F............................ Finished tests in 0.186894s, 171.2203 tests/s, 3798.9507 assertions/s. 1) Failure: test_utf8(TestDataFile) [/home/Tophe/work/svn_1/trunk/lang/ruby/test/test_datafile.rb:156]: <"家"> expected but was <"\xE5">. 32 tests, 710 assertions, 1 failures, 0 errors, 0 skips rake aborted! and no in 1.8.7 Tophe@info3:~/work/svn_1/trunk/lang/ruby$ rvm use 1.8.7 Using /home/Tophe/.rvm/gems/ruby-1.8.7-p371 Tophe@info3:~/work/svn_1/trunk/lang/ruby$ rake test /home/Tophe/work/svn_1/trunk/lang/ruby/Rakefile:19: warning: already initialized constant VERSION /home/Tophe/.rvm/rubies/ruby-1.8.7-p371/bin/ruby -I"lib:ext:bin:test" -I"/home/Tophe/.rvm/gems/ruby-1.8.7-p371@global/gems/rake-10.0.2/lib" "/home/Tophe/.rvm/gems/ruby-1.8.7-p371@global/gems/rake-10.0.2/lib/rake/rake_test_loader.rb" "test/test_socket_transport.rb" "test/test_io.rb" "test/test_datafile.rb" "test/test_help.rb" "test/test_protocol.rb" Loaded suite /home/Tophe/.rvm/gems/ruby-1.8.7-p371@global/gems/rake-10.0.2/lib/rake/rake_test_loader Started ................................ Finished in 0.379195 seconds. 32 tests, 710 assertions, 0 failures, 0 errors it seems that string.size, return the caracter count in ruby > 1.9, and not the byte count as in ruby < 1.9 the patch correct that and work for all rubies . surely it can work with jruby, but need to remove yajl, ruby json perhaps can do the job ? and we can use avro in jruby with the avro gem. Or yajl can be an option, if the require work it can be used, if not present can use JSON.load,dump. > utf-8 serialisation problems > ----------------------------- > > Key: AVRO-1206 > URL: https://issues.apache.org/jira/browse/AVRO-1206 > Project: Avro > Issue Type: Bug > Components: ruby > Affects Versions: 1.7.2 > Environment: ruby-1.9.3p194, avro gem 1.7.2. > Reporter: Tophe Vigny > Attachments: AVRO-1206.patch > > > some serialized utf-8 characters like "家" cannot be read latter, avro break > with > /gems/ruby-1.9.3-p194/gems/avro-1.7.2/lib/avro/io.rb:230:in `match_schemas': > undefined method `type' for nil:NilClass (NoMethodError) > from > /home/Tophe/.rvm/gems/ruby-1.9.3-p194/gems/avro-1.7.2/lib/avro/io.rb:288:in > `read_data' > from > /home/Tophe/.rvm/gems/ruby-1.9.3-p194/gems/avro-1.7.2/lib/avro/io.rb:384:in > `read_union' > from > /home/Tophe/.rvm/gems/ruby-1.9.3-p194/gems/avro-1.7.2/lib/avro/io.rb:317:in > `read_data' > from > /home/Tophe/.rvm/gems/ruby-1.9.3-p194/gems/avro-1.7.2/lib/avro/io.rb:392:in > `block in read_record' > from > /home/Tophe/.rvm/gems/ruby-1.9.3-p194/gems/avro-1.7.2/lib/avro/io.rb:390:in > `each' > from > /home/Tophe/.rvm/gems/ruby-1.9.3-p194/gems/avro-1.7.2/lib/avro/io.rb:390:in > `read_record' > from > /home/Tophe/.rvm/gems/ruby-1.9.3-p194/gems/avro-1.7.2/lib/avro/io.rb:318:in > `read_data' > from > /home/Tophe/.rvm/gems/ruby-1.9.3-p194/gems/avro-1.7.2/lib/avro/io.rb:283:in > `read' > from > /home/Tophe/.rvm/gems/ruby-1.9.3-p194/gems/avro-1.7.2/lib/avro/data_file.rb:223:in > `block in each' > from > /home/Tophe/.rvm/gems/ruby-1.9.3-p194/gems/avro-1.7.2/lib/avro/data_file.rb:211:in > `loop' > from > /home/Tophe/.rvm/gems/ruby-1.9.3-p194/gems/avro-1.7.2/lib/avro/data_file.rb:211:in > `each' > from avr_err_example.rb:42:in `block in <main>' -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira