[
https://issues.apache.org/jira/browse/AVRO-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16684076#comment-16684076
]
Thiruvalluvan M. G. edited comment on AVRO-2010 at 11/12/18 5:09 PM:
---------------------------------------------------------------------
Very very interesting!
Here is an even simpler use case:
{code:java}
ArraySchema a0 = ArraySchema(ArraySchema(NullSchema()));
avro::ValidSchema vs0(a0);
vs0.toJson(std::cout);{code}
It produces:
{code:java}
{
"type": "array",
"items": "null"
}{code}
as against reasonable expectation:
{code:java}
{
"type": "array",
"items": {
"type": "array",
"items": "null"
}
}{code}
What is happening? For C++ {{ArraySchema(ArraySchema(...)}} is copy
construction and the language has a feature called [copy
elision|[https://en.cppreference.com/w/cpp/language/copy_elision]] which elides
nested copy.
We can work around the problem by doing:
{code:java}
Schema b0 = ArraySchema(NullSchema());
ArraySchema a0 = ArraySchema(b0);{code}
which is equivalent to:
{code:java}
ArraySchema a0 = ArraySchema(static_cast<const
Schema&>(ArraySchema(NullSchema())));{code}
In both cases, it produces the desired output.
Now, what can be done to avoid this accident? Nothing much can be done,
because, AFAIK, there is no way to disable copy elision.
However, we can do some partial protection. The current code base fails even
for the following:
{code:java}
ArraySchema b0 = ArraySchema(NullSchema());
ArraySchema a0 = ArraySchema(b0);{code}
This usage can be fixed by overriding copy constructor of {{ArraySchema}} (and
similarly {{MapSchema}}). I'll send a pull request for this.
The basic design defect here is that we have modeled {{Schema}} as a base class
for a class hierarchy. The derived classes do not add member variables or
functions (except {{RecordSchema}}). All we needed were functions or objects to
construct different {{Schema}} objects not new classes. For example, we should
have had {{Schema arraySchema(const Schema& itemsSchema)}} instead of derived
class {{ArraySchema}}. Similarly {{NullSchema}}, {{BooleanSchema}} etc. should
have been objects not classes. Anyway, it is too late to fix the problem
without breaking existing code. Thank you [~phhe] for pointing this out.
was (Author: thiru_mg):
Very very interesting!
Here is an even simpler use case:
{code:java}
ArraySchema a0 = ArraySchema(ArraySchema(NullSchema()));
avro::ValidSchema vs0(a0);
vs0.toJson(std::cout);{code}
It produces:
{code:java}
{
"type": "array",
"items": "null"
}{code}
as against reasonable expectation:
{code:java}
{
"type": "array",
"items": {
"type": "array",
"items": "null"
}
}{code}
What is happening? For C++ {{ArraySchema(ArraySchema(...)}} is copy
construction and the language has a feature called [copy
elision|[https://en.cppreference.com/w/cpp/language/copy_elision],] which
elides nested copy.
We can work around the problem by doing:
{code:java}
Schema b0 = ArraySchema(NullSchema());
ArraySchema a0 = ArraySchema(b0);{code}
which is equivalent to:
{code:java}
ArraySchema a0 = ArraySchema(static_cast<const
Schema&>(ArraySchema(NullSchema())));{code}
In both cases, it produces the desired output.
Now, what can be done to avoid this accident? Nothing much can be done,
because, AFAIK, there is no way to disable copy elision.
However, we can do some partial protection. The current code base fails even
for the following:
{code:java}
ArraySchema b0 = ArraySchema(NullSchema());
ArraySchema a0 = ArraySchema(b0);{code}
This usage can be fixed by overriding copy constructor of {{ArraySchema}} (and
similarly {{MapSchema}}). I'll send a pull request for this.
The basic design defect here is that we have modeled {{Schema}} as a base class
for a class hierarchy. The derived classes do not add member variables or
functions (except {{RecordSchema}}). All we needed were functions or objects to
construct different {{Schema}} objects not new classes. For example, we should
have had {{Schema arraySchema(const Schema& itemsSchema)}} instead of derived
class {{ArraySchema}}. Similarly {{NullSchema}}, {{BooleanSchema}} etc. should
have been objects not classes. Anyway, it is too late to fix the problem
without breaking existing code. Thank you [~phhe] for pointing this out.
> multi dimensional array schema is not generated as expected
> -----------------------------------------------------------
>
> Key: AVRO-2010
> URL: https://issues.apache.org/jira/browse/AVRO-2010
> Project: Apache Avro
> Issue Type: Bug
> Components: c++
> Environment: Ubuntu 14.04, boost 1.62
> Reporter: Philip Henzler
> Priority: Major
>
> I made the following change to the unittest.cc, adding a two dimensional
> array to the buildSchema() test:
> {code}
> ArraySchema array = ArraySchema(DoubleSchema());
> const std::string s("myarray");
> record.addField(s, array);
> ArraySchema array2 = ArraySchema(ArraySchema(DoubleSchema()));
> const std::string s2("my2dimarray");
> record.addField(s2, array2);
> {code}
> It creates the following JSON schema output:
> {code}
> {
> "name": "myarray",
> "type": {
> "type": "array",
> "items": "double"
> }
> },
> {
> "name": "my2dimarray",
> "type": {
> "type": "array",
> "items": "double"
> }
> }
> {code}
> Even tough I would expect the following output for the two dimensional case:
> {code}
> {
> "name": "my2dimarray",
> "type": {
> "type": "array",
> "items": {
> "type": "array",
> "items": "double"
> }
> }
> }
> {code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)