Repository: avro Updated Branches: refs/heads/master 4c992a587 -> 832512edc
AVRO-1873: Add CRC32 checksum to Snappy-compressed blocks. Java and other implementations require this CRC32 checksum of the uncompressed content in order to read the data. This implements the checksum, with backward-compatibility for files written by old versions of avro-ruby. If the checksum doesn't match or if decompression fails with the last 4 bytes removed as the checksum, avro-ruby will decompress the incoming bytes and pass them on assuming that the file is from an old reader. Closes #121. Project: http://git-wip-us.apache.org/repos/asf/avro/repo Commit: http://git-wip-us.apache.org/repos/asf/avro/commit/832512ed Tree: http://git-wip-us.apache.org/repos/asf/avro/tree/832512ed Diff: http://git-wip-us.apache.org/repos/asf/avro/diff/832512ed Branch: refs/heads/master Commit: 832512edcd7591c238c35b5a479e15ac0709e4cb Parents: 4c992a5 Author: Ryan Blue <[email protected]> Authored: Sat Sep 10 15:57:30 2016 -0700 Committer: Ryan Blue <[email protected]> Committed: Mon Sep 12 09:08:07 2016 -0700 ---------------------------------------------------------------------- CHANGES.txt | 3 +++ lang/ruby/lib/avro/data_file.rb | 19 ++++++++++++++++++- lang/ruby/test/test_io.rb | 11 +++++++++++ 3 files changed, 32 insertions(+), 1 deletion(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/avro/blob/832512ed/CHANGES.txt ---------------------------------------------------------------------- diff --git a/CHANGES.txt b/CHANGES.txt index b88e798..253c356 100644 --- a/CHANGES.txt +++ b/CHANGES.txt @@ -51,6 +51,9 @@ Trunk (not yet released) AVRO-1908: Fix TestSpecificCompiler reference to private method. (blue) + AVRO-1873: Ruby: Add CRC32 checksum to Snappy-compressed blocks. + (blue) + Avro 1.8.1 (14 May 2016) INCOMPATIBLE CHANGES http://git-wip-us.apache.org/repos/asf/avro/blob/832512ed/lang/ruby/lib/avro/data_file.rb ---------------------------------------------------------------------- diff --git a/lang/ruby/lib/avro/data_file.rb b/lang/ruby/lib/avro/data_file.rb index c27c2dc..e465055 100644 --- a/lang/ruby/lib/avro/data_file.rb +++ b/lang/ruby/lib/avro/data_file.rb @@ -338,12 +338,29 @@ module Avro def decompress(data) load_snappy! + crc32 = data.slice(-4..-1).unpack('N').first + uncompressed = Snappy.inflate(data.slice(0..-5)) + + if crc32 == Zlib.crc32(uncompressed) + uncompressed + else + # older versions of avro-ruby didn't write the checksum, so if it + # doesn't match this must assume that it wasn't there and return + # the entire payload uncompressed. + Snappy.inflate(data) + end + rescue Snappy::Error + # older versions of avro-ruby didn't write the checksum, so removing + # the last 4 bytes may cause Snappy to fail. recover by assuming the + # payload is from an older file and uncompress the entire buffer. Snappy.inflate(data) end def compress(data) load_snappy! - Snappy.deflate(data) + crc32 = Zlib.crc32(data) + compressed = Snappy.deflate(data) + [compressed, crc32].pack('a*N') end private http://git-wip-us.apache.org/repos/asf/avro/blob/832512ed/lang/ruby/test/test_io.rb ---------------------------------------------------------------------- diff --git a/lang/ruby/test/test_io.rb b/lang/ruby/test/test_io.rb index 153cb94..09d725d 100644 --- a/lang/ruby/test/test_io.rb +++ b/lang/ruby/test/test_io.rb @@ -340,6 +340,17 @@ EOS assert_equal(incorrect, 0) end end + + def test_snappy_backward_compat + # a snappy-compressed block payload without the checksum + # this has no back-references, just one literal so the last 9 + # bytes are the uncompressed payload. + old_snappy_bytes = "\x09\x20\x02\x06\x02\x0a\x67\x72\x65\x65\x6e" + uncompressed_bytes = "\x02\x06\x02\x0a\x67\x72\x65\x65\x6e" + snappy = Avro::DataFile::SnappyCodec.new + assert_equal(uncompressed_bytes, snappy.decompress(old_snappy_bytes)) + end + private def check_no_default(schema_json)
