[ 
https://issues.apache.org/jira/browse/AVRO-1206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13502001#comment-13502001
 ] 

Tophe Vigny commented on AVRO-1206:
-----------------------------------

hi Doug,

you are using ruby 1.8.x (oldest branch), try with ruby > 1.9.x (official 
branch), you can use rvm (ruby version manager) to install multiple ruby 
version.

Tophe@info3:~/work/svn_1/trunk/lang/ruby$ rvm use 1.8.7
Using /home/Tophe/.rvm/gems/ruby-1.8.7-p371
Tophe@info3:~/work/svn_1/trunk/lang/ruby$ rake test
/home/Tophe/work/svn_1/trunk/lang/ruby/Rakefile:19: warning: already 
initialized constant VERSION
/home/Tophe/.rvm/rubies/ruby-1.8.7-p371/bin/ruby -I"lib:ext:bin:test" 
-I"/home/Tophe/.rvm/gems/ruby-1.8.7-p371@global/gems/rake-10.0.2/lib" 
"/home/Tophe/.rvm/gems/ruby-1.8.7-p371@global/gems/rake-10.0.2/lib/rake/rake_test_loader.rb"
 "test/test_socket_transport.rb" "test/test_io.rb" "test/test_datafile.rb" 
"test/test_help.rb" "test/test_protocol.rb" 
Loaded suite 
/home/Tophe/.rvm/gems/ruby-1.8.7-p371@global/gems/rake-10.0.2/lib/rake/rake_test_loader
Started
................................
Finished in 0.536805 seconds.

32 tests, 710 assertions, 0 failures, 0 errors


Tophe@info3:~/work/svn_1/trunk/lang/ruby$ rvm use 1.9.3
Using /home/Tophe/.rvm/gems/ruby-1.9.3-p327
Tophe@info3:~/work/svn_1/trunk/lang/ruby$ rake test
/home/Tophe/.rvm/rubies/ruby-1.9.3-p327/bin/ruby -I"lib:ext:bin:test" 
-I"/home/Tophe/.rvm/gems/ruby-1.9.3-p327@global/gems/rake-10.0.2/lib" 
"/home/Tophe/.rvm/gems/ruby-1.9.3-p327@global/gems/rake-10.0.2/lib/rake/rake_test_loader.rb"
 "test/test_socket_transport.rb" "test/test_io.rb" "test/test_datafile.rb" 
"test/test_help.rb" "test/test_protocol.rb" 
Run options: 

# Running tests:

...F............................

Finished tests in 0.212220s, 150.7870 tests/s, 3345.5875 assertions/s.

  1) Failure:
test_utf8(TestDataFile) 
[/home/Tophe/work/svn_1/trunk/lang/ruby/test/test_datafile.rb:152]:
<"家"> expected but was
<"\xE5\xAE\xB6">.

32 tests, 710 assertions, 1 failures, 0 errors, 0 skips
rake aborted!

apply that modif :

Index: test/test_datafile.rb
===================================================================
--- test/test_datafile.rb       (revision 1410649)
+++ test/test_datafile.rb       (working copy)
@@ -1,3 +1,4 @@
+# -*- coding: utf-8 -*-
 # Licensed to the Apache Software Foundation (ASF) under one
 # or more contributor license agreements.  See the NOTICE file
 # distributed with this work for additional information
@@ -140,4 +141,17 @@
       assert_equal(block_count+1, dw.block_count)
     end
   end
+  def test_utf8
+    datafile = Avro::DataFile::open('data.avr', 'w', '"string"')
+    datafile << "家"
+    datafile.close
+    
+    datafile = Avro::DataFile.open('data.avr')
+    datafile.each do |s|
+      (rmaj,rmin,rlast) = RUBY_VERSION.split(".").map {|a| a.to_i}
+      if rmaj <2 && rmin < 9
+        assert_equal "家", s
+      else
+        assert_equal "家", s.force_encoding('UTF-8')
+      end
+    end
+    datafile.close
+    end
+  end

Tophe@info3:~/work/svn_1/trunk/lang/ruby$ rake test
/home/Tophe/.rvm/rubies/ruby-1.9.3-p327/bin/ruby -I"lib:ext:bin:test" 
-I"/home/Tophe/.rvm/gems/ruby-1.9.3-p327@global/gems/rake-10.0.2/lib" 
"/home/Tophe/.rvm/gems/ruby-1.9.3-p327@global/gems/rake-10.0.2/lib/rake/rake_test_loader.rb"
 "test/test_socket_transport.rb" "test/test_io.rb" "test/test_datafile.rb" 
"test/test_help.rb" "test/test_protocol.rb" 
Run options: 

# Running tests:

................................

Finished tests in 0.166176s, 192.5669 tests/s, 4272.5791 assertions/s.

32 tests, 710 assertions, 0 failures, 0 errors, 0 skips

and now change
      def write_bytes(datum)
        write_long(datum.size)
        @writer.write(datum)
      end
      
      
and run test in 1.9.3
Tophe@info3:~/work/svn_1/trunk/lang/ruby$ rake test
/home/Tophe/.rvm/rubies/ruby-1.9.3-p327/bin/ruby -I"lib:ext:bin:test" 
-I"/home/Tophe/.rvm/gems/ruby-1.9.3-p327@global/gems/rake-10.0.2/lib" 
"/home/Tophe/.rvm/gems/ruby-1.9.3-p327@global/gems/rake-10.0.2/lib/rake/rake_test_loader.rb"
 "test/test_socket_transport.rb" "test/test_io.rb" "test/test_datafile.rb" 
"test/test_help.rb" "test/test_protocol.rb" 
Run options: 

# Running tests:

...F............................

Finished tests in 0.186894s, 171.2203 tests/s, 3798.9507 assertions/s.

  1) Failure:
test_utf8(TestDataFile) 
[/home/Tophe/work/svn_1/trunk/lang/ruby/test/test_datafile.rb:156]:
<"家"> expected but was
<"\xE5">.

32 tests, 710 assertions, 1 failures, 0 errors, 0 skips
rake aborted!

and no in 1.8.7

Tophe@info3:~/work/svn_1/trunk/lang/ruby$ rvm use 1.8.7
Using /home/Tophe/.rvm/gems/ruby-1.8.7-p371
Tophe@info3:~/work/svn_1/trunk/lang/ruby$ rake test
/home/Tophe/work/svn_1/trunk/lang/ruby/Rakefile:19: warning: already 
initialized constant VERSION
/home/Tophe/.rvm/rubies/ruby-1.8.7-p371/bin/ruby -I"lib:ext:bin:test" 
-I"/home/Tophe/.rvm/gems/ruby-1.8.7-p371@global/gems/rake-10.0.2/lib" 
"/home/Tophe/.rvm/gems/ruby-1.8.7-p371@global/gems/rake-10.0.2/lib/rake/rake_test_loader.rb"
 "test/test_socket_transport.rb" "test/test_io.rb" "test/test_datafile.rb" 
"test/test_help.rb" "test/test_protocol.rb" 
Loaded suite 
/home/Tophe/.rvm/gems/ruby-1.8.7-p371@global/gems/rake-10.0.2/lib/rake/rake_test_loader
Started
................................
Finished in 0.379195 seconds.

32 tests, 710 assertions, 0 failures, 0 errors

it seems that string.size, return the caracter count in ruby > 1.9, and not the 
byte count as in ruby < 1.9
the patch correct that and work for all rubies .
surely it can work with jruby, but need to remove yajl, ruby json perhaps can 
do the job ? and we can use avro in jruby with the avro gem.
Or yajl can be an option, if the require work it can be used, if not present 
can use JSON.load,dump.




                
> utf-8 serialisation problems 
> -----------------------------
>
>                 Key: AVRO-1206
>                 URL: https://issues.apache.org/jira/browse/AVRO-1206
>             Project: Avro
>          Issue Type: Bug
>          Components: ruby
>    Affects Versions: 1.7.2
>         Environment: ruby-1.9.3p194, avro gem 1.7.2.
>            Reporter: Tophe Vigny
>         Attachments: AVRO-1206.patch
>
>
> some serialized utf-8 characters like "家" cannot be read latter, avro break 
> with 
> /gems/ruby-1.9.3-p194/gems/avro-1.7.2/lib/avro/io.rb:230:in `match_schemas': 
> undefined method `type' for nil:NilClass (NoMethodError)
>       from 
> /home/Tophe/.rvm/gems/ruby-1.9.3-p194/gems/avro-1.7.2/lib/avro/io.rb:288:in 
> `read_data'
>       from 
> /home/Tophe/.rvm/gems/ruby-1.9.3-p194/gems/avro-1.7.2/lib/avro/io.rb:384:in 
> `read_union'
>       from 
> /home/Tophe/.rvm/gems/ruby-1.9.3-p194/gems/avro-1.7.2/lib/avro/io.rb:317:in 
> `read_data'
>       from 
> /home/Tophe/.rvm/gems/ruby-1.9.3-p194/gems/avro-1.7.2/lib/avro/io.rb:392:in 
> `block in read_record'
>       from 
> /home/Tophe/.rvm/gems/ruby-1.9.3-p194/gems/avro-1.7.2/lib/avro/io.rb:390:in 
> `each'
>       from 
> /home/Tophe/.rvm/gems/ruby-1.9.3-p194/gems/avro-1.7.2/lib/avro/io.rb:390:in 
> `read_record'
>       from 
> /home/Tophe/.rvm/gems/ruby-1.9.3-p194/gems/avro-1.7.2/lib/avro/io.rb:318:in 
> `read_data'
>       from 
> /home/Tophe/.rvm/gems/ruby-1.9.3-p194/gems/avro-1.7.2/lib/avro/io.rb:283:in 
> `read'
>       from 
> /home/Tophe/.rvm/gems/ruby-1.9.3-p194/gems/avro-1.7.2/lib/avro/data_file.rb:223:in
>  `block in each'
>       from 
> /home/Tophe/.rvm/gems/ruby-1.9.3-p194/gems/avro-1.7.2/lib/avro/data_file.rb:211:in
>  `loop'
>       from 
> /home/Tophe/.rvm/gems/ruby-1.9.3-p194/gems/avro-1.7.2/lib/avro/data_file.rb:211:in
>  `each'
>       from avr_err_example.rb:42:in `block in <main>'

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to