[ 
https://issues.apache.org/jira/browse/AVRO-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16684076#comment-16684076
 ] 

Thiruvalluvan M. G. edited comment on AVRO-2010 at 11/12/18 5:12 PM:
---------------------------------------------------------------------

Very very interesting!

Here is an even simpler use case:
{code:java}
ArraySchema a0 = ArraySchema(ArraySchema(NullSchema()));
avro::ValidSchema vs0(a0);
vs0.toJson(std::cout);{code}
It produces:
{code:java}
{
    "type": "array",
    "items": "null"
}{code}
as against reasonable expectation:
{code:java}
{
    "type": "array",
    "items": {
        "type": "array",
        "items": "null"
    }
}{code}
What is happening? For C++ {{ArraySchema(ArraySchema(...)}} is copy 
construction and the language has a feature called [copy 
elision|https://en.cppreference.com/w/cpp/language/copy_elision] which elides 
nested copy. That is {{ArraySchema(ArraySchema(.))}} is same as 
{{ArraySchema(.)}}.

 

We can work around the problem by doing:
{code:java}
Schema b0 = ArraySchema(NullSchema());
ArraySchema a0 = ArraySchema(b0);{code}
which is equivalent to:
{code:java}
ArraySchema a0 = ArraySchema(static_cast<const 
Schema&>(ArraySchema(NullSchema())));{code}
In both cases, it produces the desired output.

Now, what can be done to avoid this accident? Nothing much can be done, 
because, AFAIK, there is no way to disable copy elision.

However, we can do some partial protection. The current code base fails even 
for the following:
{code:java}
ArraySchema b0 = ArraySchema(NullSchema());
ArraySchema a0 = ArraySchema(b0);{code}
This usage can be fixed by overriding copy constructor of {{ArraySchema}} (and 
similarly {{MapSchema}}). I'll send a pull request for this.

The basic design defect here is that we have modeled {{Schema}} as a base class 
for a class hierarchy. The derived classes do not add member variables or 
functions (except {{RecordSchema}}). All we needed were functions or objects to 
construct different {{Schema}} objects not new classes. For example, we should 
have had {{Schema arraySchema(const Schema& itemsSchema)}} instead of derived 
class {{ArraySchema}}. Similarly {{NullSchema}}, {{BooleanSchema}} etc. should 
have been objects not classes. Anyway, it is too late to fix the problem 
without breaking existing code. Thank you [~phhe] for pointing this out.


was (Author: thiru_mg):
Very very interesting!

Here is an even simpler use case:
{code:java}
ArraySchema a0 = ArraySchema(ArraySchema(NullSchema()));
avro::ValidSchema vs0(a0);
vs0.toJson(std::cout);{code}
It produces:
{code:java}
{
    "type": "array",
    "items": "null"
}{code}
as against reasonable expectation:
{code:java}
{
    "type": "array",
    "items": {
        "type": "array",
        "items": "null"
    }
}{code}
What is happening? For C++ {{ArraySchema(ArraySchema(...)}} is copy 
construction and the language has a feature called [copy 
elision|https://en.cppreference.com/w/cpp/language/copy_elision] which elides 
nested copy.

 

We can work around the problem by doing:
{code:java}
Schema b0 = ArraySchema(NullSchema());
ArraySchema a0 = ArraySchema(b0);{code}
which is equivalent to:
{code:java}
ArraySchema a0 = ArraySchema(static_cast<const 
Schema&>(ArraySchema(NullSchema())));{code}
In both cases, it produces the desired output.

Now, what can be done to avoid this accident? Nothing much can be done, 
because, AFAIK, there is no way to disable copy elision.

However, we can do some partial protection. The current code base fails even 
for the following:
{code:java}
ArraySchema b0 = ArraySchema(NullSchema());
ArraySchema a0 = ArraySchema(b0);{code}
This usage can be fixed by overriding copy constructor of {{ArraySchema}} (and 
similarly {{MapSchema}}). I'll send a pull request for this.

The basic design defect here is that we have modeled {{Schema}} as a base class 
for a class hierarchy. The derived classes do not add member variables or 
functions (except {{RecordSchema}}). All we needed were functions or objects to 
construct different {{Schema}} objects not new classes. For example, we should 
have had {{Schema arraySchema(const Schema& itemsSchema)}} instead of derived 
class {{ArraySchema}}. Similarly {{NullSchema}}, {{BooleanSchema}} etc. should 
have been objects not classes. Anyway, it is too late to fix the problem 
without breaking existing code. Thank you [~phhe] for pointing this out.

> multi dimensional array schema is not generated as expected
> -----------------------------------------------------------
>
>                 Key: AVRO-2010
>                 URL: https://issues.apache.org/jira/browse/AVRO-2010
>             Project: Apache Avro
>          Issue Type: Bug
>          Components: c++
>         Environment: Ubuntu 14.04, boost 1.62
>            Reporter: Philip Henzler
>            Priority: Major
>
> I made the following change to the unittest.cc, adding a two dimensional 
> array to the buildSchema() test:
> {code}
> ArraySchema array = ArraySchema(DoubleSchema());
> const std::string s("myarray");
> record.addField(s, array);
> ArraySchema array2 = ArraySchema(ArraySchema(DoubleSchema()));
> const std::string s2("my2dimarray");
> record.addField(s2, array2);
> {code}
> It creates the following JSON schema output:
> {code}
> {
>     "name": "myarray",
>     "type": {
>         "type": "array",
>         "items": "double"
>     }
> },
> {
>     "name": "my2dimarray",
>     "type": {
>         "type": "array",
>         "items": "double"
>     }
> }
> {code}
> Even tough I would expect the following output for the two dimensional case:
> {code}
> {
>     "name": "my2dimarray",
>     "type": {
>         "type": "array",
>         "items": {
>             "type": "array",
>             "items": "double"
>         }
>     }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to