KalleOlaviNiemitalo commented on code in PR #2519:
URL: https://github.com/apache/avro/pull/2519#discussion_r1335422416
##########
lang/csharp/src/apache/main/Schema/Schema.cs:
##########
@@ -243,11 +244,28 @@ internal static Schema Parse(string json, SchemaNames
names, string encspace)
Schema sc = PrimitiveSchema.NewInstance(json);
if (null != sc) return sc;
+ // Refer to https://issues.apache.org/jira/browse/AVRO-3856
+ // Refer to https://github.com/JamesNK/Newtonsoft.Json/pull/2904
+ // Newtonsoft author advised to use JObject.Load/JArray.Load
instead of JObject.Parse()/JArray.Parse()
+ // The reason is we can set the MaxDepth property on the
JsonReader.
+ JsonReader reader = new JsonTextReader(new StringReader(json));
+ // Another issue discovered is JsonReader.Push(JsonContainerType
value) method overcounting the depth
+ // level of Avro schema. Here are the observation of
over-counting depth level in Newtonsoft's JsonReader:
+ // Avro Schema Depth JsonReader Depth Level Count
+ // 4 11
+ // 16 44
+ // 32 92
+ // 64 188
+ // So, roughly speaking, the depth level count is about 2.75 times
of Avro schema depth.
+ // Below is the hard-coded value to compensate over-counting of
depth level in Newtonsoft
+ // to support Avro schema depth level to 64 slightly beyond.
+ reader.MaxDepth = 192;
Review Comment:
Not sure that this actually fixes
[AVRO-3856](https://issues.apache.org/jira/browse/AVRO-3856) as stated.
Although this change allows an Avro schema with 64 nested record schemas,
application developers still cannot customize the limit.
##########
lang/csharp/src/apache/main/Schema/Schema.cs:
##########
@@ -243,11 +244,28 @@ internal static Schema Parse(string json, SchemaNames
names, string encspace)
Schema sc = PrimitiveSchema.NewInstance(json);
if (null != sc) return sc;
+ // Refer to https://issues.apache.org/jira/browse/AVRO-3856
+ // Refer to https://github.com/JamesNK/Newtonsoft.Json/pull/2904
+ // Newtonsoft author advised to use JObject.Load/JArray.Load
instead of JObject.Parse()/JArray.Parse()
+ // The reason is we can set the MaxDepth property on the
JsonReader.
+ JsonReader reader = new JsonTextReader(new StringReader(json));
+ // Another issue discovered is JsonReader.Push(JsonContainerType
value) method overcounting the depth
+ // level of Avro schema. Here are the observation of
over-counting depth level in Newtonsoft's JsonReader:
+ // Avro Schema Depth JsonReader Depth Level Count
+ // 4 11
+ // 16 44
+ // 32 92
+ // 64 188
+ // So, roughly speaking, the depth level count is about 2.75 times
of Avro schema depth.
+ // Below is the hard-coded value to compensate over-counting of
depth level in Newtonsoft
+ // to support Avro schema depth level to 64 slightly beyond.
Review Comment:
Please reword this not to give the impression that the Newtonsoft.Json
library has a bug that makes it count the depth incorrectly. The difference in
depth counts is rather caused by how the Avro schemas are represented in JSON;
each nested Avro schema requires multiple nested JSON containers. The
Newtonsoft.Json library is not specific to Avro and is not designed to count
Avro schemas, so the behaviour seems correct to me.
The Avro schema in
<https://github.com/JamesNK/Newtonsoft.Json/pull/2904#issuecomment-1732764055>
has 4 levels of record schemas, but its JSON representation has 11 levels of
nested containers:
```JSON
{ /* depth 1: object */
"type": "record",
"name": "Level1",
"fields": [ /* depth 2: array */
{
"name": "field1",
"type": "string"
},
{
"name": "field2",
"type": "int"
},
{ /* depth 3: object */
"name": "level2",
"type": { /* depth 4: object */
"type": "record",
"name": "Level2",
"fields": [ /* depth 5: array */
{
"name": "field3",
"type": "boolean"
},
{
"name": "field4",
"type": "double"
},
{ /* depth 6: object */
"name": "level3",
"type": { /* depth 7: object */
"type": "record",
"name": "Level3",
"fields": [ /* depth 8: array */
{
"name": "field5",
"type": "string"
},
{
"name": "field6",
"type": "int"
},
{ /* depth 9: object */
"name": "level4",
"type": {
"type": "record",
"name": "Level4",
"fields": [ /* depth 10: array */
{ /* depth 11: object */
"name": "field7",
"type": "boolean"
},
{
"name": "field8",
"type": "double"
}
]
}
}
]
}
}
]
}
}
]
}
```
##########
lang/csharp/src/apache/main/Schema/Schema.cs:
##########
@@ -243,11 +244,28 @@ internal static Schema Parse(string json, SchemaNames
names, string encspace)
Schema sc = PrimitiveSchema.NewInstance(json);
if (null != sc) return sc;
+ // Refer to https://issues.apache.org/jira/browse/AVRO-3856
+ // Refer to https://github.com/JamesNK/Newtonsoft.Json/pull/2904
+ // Newtonsoft author advised to use JObject.Load/JArray.Load
instead of JObject.Parse()/JArray.Parse()
+ // The reason is we can set the MaxDepth property on the
JsonReader.
+ JsonReader reader = new JsonTextReader(new StringReader(json));
+ // Another issue discovered is JsonReader.Push(JsonContainerType
value) method overcounting the depth
+ // level of Avro schema. Here are the observation of
over-counting depth level in Newtonsoft's JsonReader:
+ // Avro Schema Depth JsonReader Depth Level Count
+ // 4 11
+ // 16 44
+ // 32 92
+ // 64 188
+ // So, roughly speaking, the depth level count is about 2.75 times
of Avro schema depth.
+ // Below is the hard-coded value to compensate over-counting of
depth level in Newtonsoft
+ // to support Avro schema depth level to 64 slightly beyond.
+ reader.MaxDepth = 192;
+
try
{
bool IsArray = json.StartsWith("[", StringComparison.Ordinal)
&& json.EndsWith("]", StringComparison.Ordinal);
- JContainer j = IsArray ? (JContainer)JArray.Parse(json) :
(JContainer)JObject.Parse(json);
+ JContainer j = IsArray ? (JContainer)JArray.Load(reader) :
(JContainer)JObject.Load(reader);
Review Comment:
After JObject.Parse has called JObject.Load, it checks that the object in
the JSON input is not followed by anything else. Now when Avro calls
JObject.Load directly, that check no longer happens. Please add a test that
verifies Schema.Parse will throw an exception if given invalid JSON that
contains more than one JSON object, something like this:
```JSON
{
"type": "int"
}
{
"type": "string",
"doc": "This is invalid because the schema must not be followed by other
JSON objects."
}
```
And likewise for invalid JSON that contains more than one JSON array.
##########
lang/csharp/src/apache/main/Schema/Schema.cs:
##########
@@ -243,11 +244,28 @@ internal static Schema Parse(string json, SchemaNames
names, string encspace)
Schema sc = PrimitiveSchema.NewInstance(json);
if (null != sc) return sc;
+ // Refer to https://issues.apache.org/jira/browse/AVRO-3856
+ // Refer to https://github.com/JamesNK/Newtonsoft.Json/pull/2904
+ // Newtonsoft author advised to use JObject.Load/JArray.Load
instead of JObject.Parse()/JArray.Parse()
+ // The reason is we can set the MaxDepth property on the
JsonReader.
+ JsonReader reader = new JsonTextReader(new StringReader(json));
Review Comment:
Please add <code>using (<var>…</var>) { <var>…</var> }</code> to close the
reader. Although it does not have much effect now, it will become more
important if an IArrayPool\<char\> is added later.
##########
lang/csharp/src/apache/main/Schema/Schema.cs:
##########
@@ -243,11 +244,28 @@ internal static Schema Parse(string json, SchemaNames
names, string encspace)
Schema sc = PrimitiveSchema.NewInstance(json);
if (null != sc) return sc;
+ // Refer to https://issues.apache.org/jira/browse/AVRO-3856
+ // Refer to https://github.com/JamesNK/Newtonsoft.Json/pull/2904
+ // Newtonsoft author advised to use JObject.Load/JArray.Load
instead of JObject.Parse()/JArray.Parse()
+ // The reason is we can set the MaxDepth property on the
JsonReader.
+ JsonReader reader = new JsonTextReader(new StringReader(json));
Review Comment:
Please add <code>using (<var>…</var>) { <var>…</var> }</code> to close the
reader. Although it does not have much effect now, it will become more
important if an IArrayPool\<char\> is added later.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]