[
https://issues.apache.org/jira/browse/AVRO-3480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Spencer Williams updated AVRO-3480:
-----------------------------------
Summary: Avro files with multiple "blocks" fail to deserialize when using
the DEFLATE codec (throwing an error instead) (was: Avro files with multiple
"blocks" fail to deserialize from a file when using the DEFLATE codec)
> Avro files with multiple "blocks" fail to deserialize when using the DEFLATE
> codec (throwing an error instead)
> --------------------------------------------------------------------------------------------------------------
>
> Key: AVRO-3480
> URL: https://issues.apache.org/jira/browse/AVRO-3480
> Project: Apache Avro
> Issue Type: Bug
> Components: php
> Affects Versions: 1.11.0
> Reporter: Spencer Williams
> Priority: Critical
> Attachments: repro_java_create_problematic_avro_file.zip, test.avro
>
>
> When attempting in PHP to deserialize a file containing a large number of
> records (see example file attached – 20,000 records) that uses the DEFLATE
> codec, the `$decoder` instance advances through the file incorrectly,
> eventually yielding an empty string that is passed into `gzinflate(...)` on
> this line:
> [https://github.com/apache/avro/blob/a6f13b269a359d3839e55a75e0662d834d76992c/lang/php/lib/DataFile/AvroDataIOReader.php#L176]
>
> ...resulting in a PHP error being raised. Notably, at the time when this
> happens, not all records have been deserialized, so it seems that this is
> related to there being multiple "blocks" in the file.
> I've attached a file that meets this condition, and also a quick Kotlin
> project using the official Java library that I used to generate the file.
> The PHP code in question to reproduce this behavior is pretty standard,
> lifted directly from the provided {{examples/write_read.php}} file:
>
> {{{}<?php{}}}{{{}if (count($argv) < 2) {{}}}
> {{ echo "USAGE: php main.php FILENAME";}}
> {{ exit(1);}}
> {{}}}
> {{$filename = $argv[1];}}
> {{require_once __DIR__ . '/../vendor/avro-php-1.11.0/lib/autoload.php';}}
> {{use Apache\Avro\DataFile\AvroDataIO;}}
> {{$data_reader = AvroDataIO::openFile($filename);}}
> {{echo "Reading from $filename:\n";}}
> {{foreach ($data_reader->data() as $datum) {}}
> {{ echo var_export($datum, true) . "\n";}}
> {{}}}
> {{$data_reader->close();}}
>
--
This message was sent by Atlassian Jira
(v8.20.1#820001)