[
https://issues.apache.org/jira/browse/AVRO-1783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Martin Kleppmann updated AVRO-1783:
-----------------------------------
Attachment: AVRO-1783-2.patch
[~busbey]: I reproduced your issue on JRuby 1.7.3. Looks like another case of
wrong character encoding (using UTF-8 strings where binary strings should be
used), this time in the IPC module. This is not a new bug — it looks like it's
been in the code as long as it has existed — but we might as well fix it.
I've attached a v2 patch that also forces all the buffers in the IPC module to
be binary. This patch makes phunt/avro-rpc-quickstart work in all versions of
Ruby that I tested (jruby-1.7.3, jruby-1.7.23, ruby-1.9.3-p484, ruby-2.1.4,
ruby-2.2.3). The unit tests also still pass in all those versions.
Note the new patch uses String#force_encoding, which was introduced in Ruby
1.9, so this patch is not trying to be compatible with Ruby 1.8.7. We could
make it compatible with 1.8.7, at the cost of uglier code, but assuming we're
dropping support for 1.8.7 anyway (AVRO-1785), I think we can keep it as-is.
> Gracefully handle strings with wrong character encoding
> -------------------------------------------------------
>
> Key: AVRO-1783
> URL: https://issues.apache.org/jira/browse/AVRO-1783
> Project: Avro
> Issue Type: Bug
> Components: ruby
> Affects Versions: 1.7.7
> Reporter: Martin Kleppmann
> Attachments: AVRO-1783-2.patch, AVRO-1783.patch, AVRO-1783.stack.text
>
>
> In the [vote thread for Avro
> 1.8.0-rc2|http://mail-archives.apache.org/mod_mbox/avro-dev/201601.mbox/%3CCAGHyZ6K-oe35%2BOYROK6MSwrHxfPHvjmqhJAfRJL2dzexYw6YSw%40mail.gmail.com%3E],
> [~busbey] noticed that [phunt's
> avro-rpc-quickstart|https://github.com/phunt/avro-rpc-quickstart] fails:
> {code}
> busbey$ ruby sample_ipc_client.rb avro_user pat Hello_World
> Avro::IO::AvroTypeError: The datum
> "\x89\xA9\xD1\xFF@NUm\xEA\x9A\xFB\xDAx\xF5Zq"
> is not an example of schema
> {"type":"fixed","name":"MD5","namespace":"org.apache.avro.ipc","size":16}
> write_data at
> /Users/busbey/.rvm/gems/jruby-1.7.3/gems/avro-1.8.0/lib/avro/io.rb:543
> write_record at
> /Users/busbey/.rvm/gems/jruby-1.7.3/gems/avro-1.8.0/lib/avro/io.rb:610
> each at org/jruby/RubyArray.java:1613
> write_record at
> /Users/busbey/.rvm/gems/jruby-1.7.3/gems/avro-1.8.0/lib/avro/io.rb:609
> write_data at
> /Users/busbey/.rvm/gems/jruby-1.7.3/gems/avro-1.8.0/lib/avro/io.rb:561
> write at
> /Users/busbey/.rvm/gems/jruby-1.7.3/gems/avro-1.8.0/lib/avro/io.rb:538
> write_handshake_request at
> /Users/busbey/.rvm/gems/jruby-1.7.3/gems/avro-1.8.0/lib/avro/ipc.rb:136
> request at
> /Users/busbey/.rvm/gems/jruby-1.7.3/gems/avro-1.8.0/lib/avro/ipc.rb:105
> request at
> /Users/busbey/.rvm/gems/jruby-1.7.3/gems/avro-1.8.0/lib/avro/ipc.rb:117
> (root) at sample_ipc_client.rb:49
> {code}
> I tried reproducing the error, and it is quite strange. avro-rpc-quickstart
> works fine for me in Ruby (MRI) 2.2 and 2.1, and in JRuby 1.7.23. However,
> [~busbey] was using JRuby 1.7.3 (as visible from the path names above), and
> in this particular version of JRuby I was able to reproduce the issue.
> It seems that in some circumstances (but not always, bizarrely), JRuby 1.7.3
> returns a UTF-8 encoded string from {{Digest::MD5.digest}}, rather than a
> binary-encoded string. {{Schema.validate}} checks that the string is suitable
> for writing as datum for a {{fixed}} type by calling {{#size}}. In this case,
> although the MD5 digest of the schema is a 16-byte string, if you interpret
> it as a UTF-8 encoded string, it consists of only 13 characters (i.e. some
> sequences are interpreted as multibyte characters).
> Rather than trying to divine why JRuby is being weird here, I think this is
> an opportunity to fix Avro's handling of strings to make it robust against
> unexpected encodings.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)