KalleOlaviNiemitalo commented on code in PR #2519:
URL: https://github.com/apache/avro/pull/2519#discussion_r1335422416


##########
lang/csharp/src/apache/main/Schema/Schema.cs:
##########
@@ -243,11 +244,28 @@ internal static Schema Parse(string json, SchemaNames 
names, string encspace)
             Schema sc = PrimitiveSchema.NewInstance(json);
             if (null != sc) return sc;
 
+            // Refer to https://issues.apache.org/jira/browse/AVRO-3856
+            // Refer to https://github.com/JamesNK/Newtonsoft.Json/pull/2904
+            // Newtonsoft author advised to use JObject.Load/JArray.Load 
instead of JObject.Parse()/JArray.Parse()
+            // The reason is we can set the MaxDepth property on the 
JsonReader.
+            JsonReader reader = new JsonTextReader(new StringReader(json));
+            // Another issue discovered is JsonReader.Push(JsonContainerType 
value) method overcounting the depth
+            // level of Avro schema.  Here are the observation of 
over-counting depth level in Newtonsoft's JsonReader:
+            // Avro Schema Depth       JsonReader Depth Level Count
+            // 4                       11
+            // 16                   44
+            // 32                      92
+            // 64                      188
+            // So, roughly speaking, the depth level count is about 2.75 times 
of Avro schema depth.
+            // Below is the hard-coded value to compensate over-counting of 
depth level in Newtonsoft
+            // to support Avro schema depth level to 64 slightly beyond.
+            reader.MaxDepth = 192;

Review Comment:
   Not sure that this actually fixes 
[AVRO-3856](https://issues.apache.org/jira/browse/AVRO-3856) as stated.  
Although this change allows an Avro schema with 64 nested record schemas, 
application developers still cannot customize the limit.



##########
lang/csharp/src/apache/main/Schema/Schema.cs:
##########
@@ -243,11 +244,28 @@ internal static Schema Parse(string json, SchemaNames 
names, string encspace)
             Schema sc = PrimitiveSchema.NewInstance(json);
             if (null != sc) return sc;
 
+            // Refer to https://issues.apache.org/jira/browse/AVRO-3856
+            // Refer to https://github.com/JamesNK/Newtonsoft.Json/pull/2904
+            // Newtonsoft author advised to use JObject.Load/JArray.Load 
instead of JObject.Parse()/JArray.Parse()
+            // The reason is we can set the MaxDepth property on the 
JsonReader.
+            JsonReader reader = new JsonTextReader(new StringReader(json));
+            // Another issue discovered is JsonReader.Push(JsonContainerType 
value) method overcounting the depth
+            // level of Avro schema.  Here are the observation of 
over-counting depth level in Newtonsoft's JsonReader:
+            // Avro Schema Depth       JsonReader Depth Level Count
+            // 4                       11
+            // 16                   44
+            // 32                      92
+            // 64                      188
+            // So, roughly speaking, the depth level count is about 2.75 times 
of Avro schema depth.
+            // Below is the hard-coded value to compensate over-counting of 
depth level in Newtonsoft
+            // to support Avro schema depth level to 64 slightly beyond.

Review Comment:
   Please reword this not to give the impression that the Newtonsoft.Json 
library has a bug that makes it count the depth incorrectly.  The difference in 
depth counts is rather caused by how the Avro schemas are represented in JSON; 
each nested Avro schema requires multiple nested JSON containers.  The 
Newtonsoft.Json library is not specific to Avro and is not designed to count 
Avro schemas, so the behaviour seems correct to me.
   
   The Avro schema in 
<https://github.com/JamesNK/Newtonsoft.Json/pull/2904#issuecomment-1732764055> 
has 4 levels of record schemas, but its JSON representation has 11 levels of 
nested containers: 
   
   ```JSON
   { /* depth 1: object */
     "type": "record",
     "name": "Level1",
     "fields": [ /* depth 2: array */
       {
         "name": "field1",
         "type": "string"
       },
       {
         "name": "field2",
         "type": "int"
       },
       { /* depth 3: object */
         "name": "level2",
         "type": { /* depth 4: object */
           "type": "record",
           "name": "Level2",
           "fields": [ /* depth 5: array */
             {
               "name": "field3",
               "type": "boolean"
             },
             {
               "name": "field4",
               "type": "double"
             },
             { /* depth 6: object */
               "name": "level3",
               "type": { /* depth 7: object */
                 "type": "record",
                 "name": "Level3",
                 "fields": [ /* depth 8: array */
                   {
                     "name": "field5",
                     "type": "string"
                   },
                   {
                     "name": "field6",
                     "type": "int"
                   },
                   { /* depth 9: object */
                     "name": "level4",
                     "type": {
                       "type": "record",
                       "name": "Level4",
                       "fields": [ /* depth 10: array */
                         { /* depth 11: object */
                           "name": "field7",
                           "type": "boolean"
                         },
                         {
                           "name": "field8",
                           "type": "double"
                         }
                       ]
                     }
                   }
                 ]
               }
             }
           ]
         }
       }
     ]
   }
   ```
   



##########
lang/csharp/src/apache/main/Schema/Schema.cs:
##########
@@ -243,11 +244,28 @@ internal static Schema Parse(string json, SchemaNames 
names, string encspace)
             Schema sc = PrimitiveSchema.NewInstance(json);
             if (null != sc) return sc;
 
+            // Refer to https://issues.apache.org/jira/browse/AVRO-3856
+            // Refer to https://github.com/JamesNK/Newtonsoft.Json/pull/2904
+            // Newtonsoft author advised to use JObject.Load/JArray.Load 
instead of JObject.Parse()/JArray.Parse()
+            // The reason is we can set the MaxDepth property on the 
JsonReader.
+            JsonReader reader = new JsonTextReader(new StringReader(json));
+            // Another issue discovered is JsonReader.Push(JsonContainerType 
value) method overcounting the depth
+            // level of Avro schema.  Here are the observation of 
over-counting depth level in Newtonsoft's JsonReader:
+            // Avro Schema Depth       JsonReader Depth Level Count
+            // 4                       11
+            // 16                   44
+            // 32                      92
+            // 64                      188
+            // So, roughly speaking, the depth level count is about 2.75 times 
of Avro schema depth.
+            // Below is the hard-coded value to compensate over-counting of 
depth level in Newtonsoft
+            // to support Avro schema depth level to 64 slightly beyond.
+            reader.MaxDepth = 192;
+
             try
             {
                 bool IsArray = json.StartsWith("[", StringComparison.Ordinal)
                     && json.EndsWith("]", StringComparison.Ordinal);
-                JContainer j = IsArray ? (JContainer)JArray.Parse(json) : 
(JContainer)JObject.Parse(json);
+                JContainer j = IsArray ? (JContainer)JArray.Load(reader) : 
(JContainer)JObject.Load(reader);

Review Comment:
   After JObject.Parse has called JObject.Load, it checks that the object in 
the JSON input is not followed by anything else.  Now when Avro calls 
JObject.Load directly, that check no longer happens.  Please add a test that 
verifies Schema.Parse will throw an exception if given invalid JSON that 
contains more than one JSON object, something like this:
   
   ```JSON
   {
       "type": "int"
   }
   {
       "type": "string",
       "doc": "This is invalid because the schema must not be followed by other 
JSON objects."
   }
   ```
   
   And likewise for invalid JSON that contains more than one JSON array.



##########
lang/csharp/src/apache/main/Schema/Schema.cs:
##########
@@ -243,11 +244,28 @@ internal static Schema Parse(string json, SchemaNames 
names, string encspace)
             Schema sc = PrimitiveSchema.NewInstance(json);
             if (null != sc) return sc;
 
+            // Refer to https://issues.apache.org/jira/browse/AVRO-3856
+            // Refer to https://github.com/JamesNK/Newtonsoft.Json/pull/2904
+            // Newtonsoft author advised to use JObject.Load/JArray.Load 
instead of JObject.Parse()/JArray.Parse()
+            // The reason is we can set the MaxDepth property on the 
JsonReader.
+            JsonReader reader = new JsonTextReader(new StringReader(json));

Review Comment:
   Please add <code>using (<var>…</var>) { <var>…</var> }</code> to close the 
reader.  Although it does not have much effect now, it will become more 
important if an IArrayPool\<char\> is added later.



##########
lang/csharp/src/apache/main/Schema/Schema.cs:
##########
@@ -243,11 +244,28 @@ internal static Schema Parse(string json, SchemaNames 
names, string encspace)
             Schema sc = PrimitiveSchema.NewInstance(json);
             if (null != sc) return sc;
 
+            // Refer to https://issues.apache.org/jira/browse/AVRO-3856
+            // Refer to https://github.com/JamesNK/Newtonsoft.Json/pull/2904
+            // Newtonsoft author advised to use JObject.Load/JArray.Load 
instead of JObject.Parse()/JArray.Parse()
+            // The reason is we can set the MaxDepth property on the 
JsonReader.
+            JsonReader reader = new JsonTextReader(new StringReader(json));

Review Comment:
   Please add <code>using (<var>…</var>) { <var>…</var> }</code> to close the 
reader.  Although it does not have much effect now, it will become more 
important if an IArrayPool\<char\> is added later.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to