[ 
https://issues.apache.org/jira/browse/AVRO-4090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17969551#comment-17969551
 ] 

ASF subversion and git services commented on AVRO-4090:
-------------------------------------------------------

Commit 5324d94ebe2ca7145ef5d81aa01590cedbc46fb2 in avro's branch 
refs/heads/branch-1.12 from Thiago Romão Barcala
[ https://gitbox.apache.org/repos/asf?p=avro.git;h=5324d94ebe ]

AVRO-4090: Avoid repeating data validation (#3241)

(cherry picked from commit d5d5466d8d8a36fcbecbc924515174638f7ad515)


> PHP data is validated multiple times for nested schemas
> -------------------------------------------------------
>
>                 Key: AVRO-4090
>                 URL: https://issues.apache.org/jira/browse/AVRO-4090
>             Project: Apache Avro
>          Issue Type: Improvement
>            Reporter: Thiago Romão Barcala
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Consider the test script below:
> {code:php}
> <?php
> use Apache\Avro\Datum\AvroIOBinaryEncoder;
> use Apache\Avro\Datum\AvroIODatumWriter;
> use Apache\Avro\IO\AvroStringIO;
> use Apache\Avro\Schema\AvroSchema;
> require_once 'vendor/autoload.php';
> $writer = new AvroIODatumWriter();
> $schemaJson = <<<'JSON'
>     {
>         "type": "record",
>         "name": "A",
>         "fields": [
>             {
>                 "name": "a",
>                 "type": {
>                     "type": "record",
>                     "name": "B",
>                     "fields": [
>                         {
>                             "name": "b",
>                             "type": {
>                                 "type": "record",
>                                 "name": "C",
>                                 "fields": [
>                                     {
>                                         "name": "c",
>                                         "type": {
>                                             "type": "record",
>                                             "name": "D",
>                                             "fields": [
>                                                 {
>                                                     "name": "d",
>                                                     "type": {
>                                                         "type": "record",
>                                                         "name": "E",
>                                                         "fields": [
>                                                             {
>                                                                 "name": "e",
>                                                                 "type": 
> "string"
>                                                             }
>                                                         ]
>                                                     }
>                                                 }
>                                             ]
>                                         }
>                                     }
>                                 ]
>                             }
>                         }
>                     ]
>                 }
>             }
>         ]
>     }
>     JSON
>     ;
> $data = ['a' => ['b' => ['c' => ['d' => ['e' => 'value']]]]];
> $schema = AvroSchema::parse($schemaJson);
> $io = new AvroStringIO();
> $writer->writeData($schema, $data, new AvroIOBinaryEncoder($io));
> var_dump($io->__toString()); {code}
> By running the script above with the command line below, it is possible to 
> see, by inspecting the profiler output, that the method 
> AvroSchema::isValidDatum is called 21 times:
> {code:bash}
> php -dxdebug.start_with_request=true -dxdebug.mode=profile 
> -dxdebug.output_dir=$(pwd) test.php
> {code}
> The validation should be called only 6 times though, once for each record, 
> and once for the string value. This is happening, because writeData is being 
> called for every field of the record, and writeData validates the entire data 
> graph.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to