[ 
https://issues.apache.org/jira/browse/AVRO-3480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Spencer Williams updated AVRO-3480:
-----------------------------------
    Summary: Avro files with multiple "blocks" fail to deserialize when using 
the DEFLATE codec (throwing an error instead)  (was: Avro files with multiple 
"blocks" fail to deserialize from a file when using the DEFLATE codec)

> Avro files with multiple "blocks" fail to deserialize when using the DEFLATE 
> codec (throwing an error instead)
> --------------------------------------------------------------------------------------------------------------
>
>                 Key: AVRO-3480
>                 URL: https://issues.apache.org/jira/browse/AVRO-3480
>             Project: Apache Avro
>          Issue Type: Bug
>          Components: php
>    Affects Versions: 1.11.0
>            Reporter: Spencer Williams
>            Priority: Critical
>         Attachments: repro_java_create_problematic_avro_file.zip, test.avro
>
>
> When attempting in PHP to deserialize a file containing a large number of 
> records (see example file attached – 20,000 records) that uses the DEFLATE 
> codec, the `$decoder` instance advances through the file incorrectly, 
> eventually yielding an empty string that is passed into `gzinflate(...)` on 
> this line: 
> [https://github.com/apache/avro/blob/a6f13b269a359d3839e55a75e0662d834d76992c/lang/php/lib/DataFile/AvroDataIOReader.php#L176]
>  
> ...resulting in a PHP error being raised. Notably, at the time when this 
> happens, not all records have been deserialized, so it seems that this is 
> related to there being multiple "blocks" in the file.
> I've attached a file that meets this condition, and also a quick Kotlin 
> project using the official Java library that I used to generate the file.
> The PHP code in question to reproduce this behavior is pretty standard, 
> lifted directly from the provided {{examples/write_read.php}} file:
>  
> {{{}<?php{}}}{{{}if (count($argv) < 2) {{}}}
> {{    echo "USAGE: php main.php FILENAME";}}
> {{    exit(1);}}
> {{}}}
> {{$filename = $argv[1];}}
> {{require_once __DIR__ . '/../vendor/avro-php-1.11.0/lib/autoload.php';}}
> {{use Apache\Avro\DataFile\AvroDataIO;}}
> {{$data_reader = AvroDataIO::openFile($filename);}}
> {{echo "Reading from $filename:\n";}}
> {{foreach ($data_reader->data() as $datum) {}}
> {{    echo var_export($datum, true) . "\n";}}
> {{}}}
> {{$data_reader->close();}}
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to