[
https://issues.apache.org/jira/browse/AVRO-3480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17517508#comment-17517508
]
Spencer Williams commented on AVRO-3480:
----------------------------------------
After further testing, this appears to also happen with files that use zstd,
bzip2, and snappy codecs. It seems like the issue is in how the decoder
"advances" through the file _any_ time that compression is involved.
> Avro files with multiple "blocks" fail to deserialize when using the DEFLATE
> codec (throwing an error instead)
> --------------------------------------------------------------------------------------------------------------
>
> Key: AVRO-3480
> URL: https://issues.apache.org/jira/browse/AVRO-3480
> Project: Apache Avro
> Issue Type: Bug
> Components: php
> Affects Versions: 1.11.0
> Reporter: Spencer Williams
> Priority: Critical
> Attachments: repro_java_create_problematic_avro_file.zip, test.avro
>
>
> When attempting in PHP to deserialize a file containing a large number of
> records (see example file attached – 20,000 records) that uses the DEFLATE
> codec, the `$decoder` instance advances through the file incorrectly,
> eventually yielding an empty string that is passed into `gzinflate(...)` on
> this line:
> [https://github.com/apache/avro/blob/a6f13b269a359d3839e55a75e0662d834d76992c/lang/php/lib/DataFile/AvroDataIOReader.php#L176]
>
> ...resulting in a PHP error being raised. Notably, at the time when this
> happens, not all records have been deserialized, so it seems that this is
> related to there being multiple "blocks" in the file.
> I've attached a file that meets this condition, and also a quick Kotlin
> project using the official Java library that I used to generate the file.
> The PHP code in question to reproduce this behavior is pretty standard,
> lifted directly from the provided {{examples/write_read.php}} file:
>
> {{{}<?php{}}}{{{}if (count($argv) < 2) {{}}}
> {{ echo "USAGE: php main.php FILENAME";}}
> {{ exit(1);}}
> {{}}}
> {{$filename = $argv[1];}}
> {{require_once __DIR__ . '/../vendor/avro-php-1.11.0/lib/autoload.php';}}
> {{use Apache\Avro\DataFile\AvroDataIO;}}
> {{$data_reader = AvroDataIO::openFile($filename);}}
> {{echo "Reading from $filename:\n";}}
> {{foreach ($data_reader->data() as $datum) {}}
> {{ echo var_export($datum, true) . "\n";}}
> {{}}}
> {{$data_reader->close();}}
>
--
This message was sent by Atlassian Jira
(v8.20.1#820001)