[ 
https://issues.apache.org/jira/browse/AVRO-892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13112200#comment-13112200
 ] 

Tom White commented on AVRO-892:
--------------------------------

This looks like the right fix to me. Thanks for reporting it! 

The mask was applied when comparing checksums, but was missed for the write 
path. See also the note at 
http://docs.python.org/library/binascii.html#binascii.crc32

It would be good to have an interoperability test for this that uses larger 
volumes of data than the testing I did in AVRO-866.



> Python snappy error: "integer out of range for 'I' format code"
> ---------------------------------------------------------------
>
>                 Key: AVRO-892
>                 URL: https://issues.apache.org/jira/browse/AVRO-892
>             Project: Avro
>          Issue Type: Bug
>          Components: python
>    Affects Versions: 1.5.4
>         Environment: Linux michaelc 2.6.38-11-generic #48-Ubuntu SMP Fri Jul 
> 29 19:02:55 UTC 2011 x86_64 x86_64 x86_64 GNU/Linux
> Ubuntu 11.04
> Python 2.7.1+ (ubuntu stock version)
> avro-1.5.4-py2.7.egg
> snappy-1.0.4 (c library)
> python-snappy-0.3.2
>            Reporter: Michael Cooper
>
> The Python library for avro fails to write some blocks when used with snappy 
> compression.
> The error is:
> {code}
> Traceback (most recent call last):
>   File "tools/json_to_avro.py", line 74, in <module>
>     writer.append(line)
>   File "/home/michaelc/.python/2.7/avro-1.5.4-py2.7.egg/avro/datafile.py", 
> line 185, in append
>     self._write_block()
>   File "/home/michaelc/.python/2.7/avro-1.5.4-py2.7.egg/avro/datafile.py", 
> line 169, in _write_block
>     self.encoder.write_crc32(uncompressed_data)
>   File "/home/michaelc/.python/2.7/avro-1.5.4-py2.7.egg/avro/io.py", line 
> 364, in write_crc32
>     self.write(STRUCT_CRC32.pack(crc32(bytes)));
> struct.error: integer out of range for 'I' format code
> {code}
> From my investigation, str(crc32(bytes)) is showing negative integers, so the 
> issue seems to be fixed by masking the output.
> This fix appears to work from my limited testing:
> {code}
> --- io.old.py 2011-09-21 14:32:38.992544680 +1000
> +++ io.py     2011-09-21 14:33:11.492544686 +1000
> @@ -360,7 +360,7 @@
>      """
>      A 4-byte, big-endian CRC32 checksum
>      """
> -    self.write(STRUCT_CRC32.pack(crc32(bytes)));
> +    self.write(STRUCT_CRC32.pack(crc32(bytes) & 0xffffffff));
>  
>  #
>  # DatumReader/Writer
> {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to