[
https://issues.apache.org/jira/browse/AVRO-1783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15110769#comment-15110769
]
ASF subversion and git services commented on AVRO-1783:
-------------------------------------------------------
Commit 1725988 from [~martinkl] in branch 'avro/trunk'
[ https://svn.apache.org/r1725988 ]
AVRO-1783. Ruby: Ensure correct binary encoding for byte strings.
> Gracefully handle strings with wrong character encoding
> -------------------------------------------------------
>
> Key: AVRO-1783
> URL: https://issues.apache.org/jira/browse/AVRO-1783
> Project: Avro
> Issue Type: Bug
> Components: ruby
> Affects Versions: 1.7.7
> Reporter: Martin Kleppmann
> Assignee: Martin Kleppmann
> Attachments: AVRO-1783-2.patch, AVRO-1783.patch, AVRO-1783.stack.text
>
>
> In the [vote thread for Avro
> 1.8.0-rc2|http://mail-archives.apache.org/mod_mbox/avro-dev/201601.mbox/%3CCAGHyZ6K-oe35%2BOYROK6MSwrHxfPHvjmqhJAfRJL2dzexYw6YSw%40mail.gmail.com%3E],
> [~busbey] noticed that [phunt's
> avro-rpc-quickstart|https://github.com/phunt/avro-rpc-quickstart] fails:
> {code}
> busbey$ ruby sample_ipc_client.rb avro_user pat Hello_World
> Avro::IO::AvroTypeError: The datum
> "\x89\xA9\xD1\xFF@NUm\xEA\x9A\xFB\xDAx\xF5Zq"
> is not an example of schema
> {"type":"fixed","name":"MD5","namespace":"org.apache.avro.ipc","size":16}
> write_data at
> /Users/busbey/.rvm/gems/jruby-1.7.3/gems/avro-1.8.0/lib/avro/io.rb:543
> write_record at
> /Users/busbey/.rvm/gems/jruby-1.7.3/gems/avro-1.8.0/lib/avro/io.rb:610
> each at org/jruby/RubyArray.java:1613
> write_record at
> /Users/busbey/.rvm/gems/jruby-1.7.3/gems/avro-1.8.0/lib/avro/io.rb:609
> write_data at
> /Users/busbey/.rvm/gems/jruby-1.7.3/gems/avro-1.8.0/lib/avro/io.rb:561
> write at
> /Users/busbey/.rvm/gems/jruby-1.7.3/gems/avro-1.8.0/lib/avro/io.rb:538
> write_handshake_request at
> /Users/busbey/.rvm/gems/jruby-1.7.3/gems/avro-1.8.0/lib/avro/ipc.rb:136
> request at
> /Users/busbey/.rvm/gems/jruby-1.7.3/gems/avro-1.8.0/lib/avro/ipc.rb:105
> request at
> /Users/busbey/.rvm/gems/jruby-1.7.3/gems/avro-1.8.0/lib/avro/ipc.rb:117
> (root) at sample_ipc_client.rb:49
> {code}
> I tried reproducing the error, and it is quite strange. avro-rpc-quickstart
> works fine for me in Ruby (MRI) 2.2 and 2.1, and in JRuby 1.7.23. However,
> [~busbey] was using JRuby 1.7.3 (as visible from the path names above), and
> in this particular version of JRuby I was able to reproduce the issue.
> It seems that in some circumstances (but not always, bizarrely), JRuby 1.7.3
> returns a UTF-8 encoded string from {{Digest::MD5.digest}}, rather than a
> binary-encoded string. {{Schema.validate}} checks that the string is suitable
> for writing as datum for a {{fixed}} type by calling {{#size}}. In this case,
> although the MD5 digest of the schema is a 16-byte string, if you interpret
> it as a UTF-8 encoded string, it consists of only 13 characters (i.e. some
> sequences are interpreted as multibyte characters).
> Rather than trying to divine why JRuby is being weird here, I think this is
> an opportunity to fix Avro's handling of strings to make it robust against
> unexpected encodings.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)