wgtmac commented on code in PR #3266:
URL: https://github.com/apache/avro/pull/3266#discussion_r1914203397


##########
lang/c++/include/avro/CustomAttributes.hh:
##########
@@ -19,27 +19,76 @@
 #ifndef avro_CustomAttributes_hh__
 #define avro_CustomAttributes_hh__
 
-#include "Config.hh"
 #include <iostream>

Review Comment:
   nit: is it possible to remove this line? It looks weird that a public header 
exposes iostream.



##########
lang/c++/include/avro/CustomAttributes.hh:
##########
@@ -19,27 +19,76 @@
 #ifndef avro_CustomAttributes_hh__
 #define avro_CustomAttributes_hh__
 
-#include "Config.hh"
 #include <iostream>
 #include <map>
 #include <optional>
 #include <string>
 
+#include "Config.hh"
+
 namespace avro {
 
 // CustomAttributes class stores avro custom attributes.
 // Each attribute is represented by a unique name and value.
 // User is supposed to create CustomAttributes object and then add it to 
Schema.
 class AVRO_DECL CustomAttributes {
+
 public:
-    // Retrieves the custom attribute json entity for that attributeName, 
returns an
-    // null if the attribute doesn't exist.
+    enum ValueMode {
+        // When a CustomAttributes is created using this mode, all values are 
strings.
+        // The value should not be quoted, but any interior quotes and special
+        // characters (such as newlines) must be escaped.
+        string,
+        // When a CustomAttributes is created using this mode, all values are 
JSON
+        // values. String values must be quoted and escaped.
+        json
+    };
+
+    // Creates a new CustomAttributes object where all values are strings.
+    // All values passed to addAttribute() and returned from getAttribute() or 
the
+    // attributes() map will not be enclosed in quotes. However, any internal 
quotes
+    // WILL be escaped and other special characters MAY be escaped.
+    //
+    // To support non-string values, use 
CustomAttributes(CustomAttributes::json) instead.
+    CustomAttributes() : CustomAttributes(string) {}
+
+    // Creates a new CustomAttributes object.
+    //
+    // If the given mode is string, all values must be strings. All values 
passed to

Review Comment:
   If the `ValueMode` enum has good documentation, we don't have to repeat them 
here. Otherwise, it is easy to be out of sync.



##########
lang/c++/include/avro/CustomAttributes.hh:
##########
@@ -19,27 +19,76 @@
 #ifndef avro_CustomAttributes_hh__
 #define avro_CustomAttributes_hh__
 
-#include "Config.hh"
 #include <iostream>
 #include <map>
 #include <optional>
 #include <string>
 
+#include "Config.hh"
+
 namespace avro {
 
 // CustomAttributes class stores avro custom attributes.
 // Each attribute is represented by a unique name and value.
 // User is supposed to create CustomAttributes object and then add it to 
Schema.
 class AVRO_DECL CustomAttributes {
+
 public:
-    // Retrieves the custom attribute json entity for that attributeName, 
returns an
-    // null if the attribute doesn't exist.
+    enum ValueMode {
+        // When a CustomAttributes is created using this mode, all values are 
strings.
+        // The value should not be quoted, but any interior quotes and special
+        // characters (such as newlines) must be escaped.
+        string,
+        // When a CustomAttributes is created using this mode, all values are 
JSON
+        // values. String values must be quoted and escaped.
+        json
+    };
+
+    // Creates a new CustomAttributes object where all values are strings.
+    // All values passed to addAttribute() and returned from getAttribute() or 
the
+    // attributes() map will not be enclosed in quotes. However, any internal 
quotes
+    // WILL be escaped and other special characters MAY be escaped.
+    //
+    // To support non-string values, use 
CustomAttributes(CustomAttributes::json) instead.
+    CustomAttributes() : CustomAttributes(string) {}

Review Comment:
   Should we mark it as `[[deprecated]]` because we want discourage users to 
use this one?



##########
lang/c++/impl/json/JsonDom.cc:
##########
@@ -92,7 +92,11 @@ Entity loadEntity(const char *text) {
 Entity loadEntity(InputStream &in) {
     JsonParser p;
     p.init(in);
-    return readEntity(p);
+    Entity e = readEntity(p);
+    if (p.hasMore()) {

Review Comment:
   Why not checking it in the `readEntity` function?



##########
lang/c++/include/avro/CustomAttributes.hh:
##########
@@ -19,27 +19,76 @@
 #ifndef avro_CustomAttributes_hh__
 #define avro_CustomAttributes_hh__
 
-#include "Config.hh"
 #include <iostream>
 #include <map>
 #include <optional>
 #include <string>
 
+#include "Config.hh"
+
 namespace avro {
 
 // CustomAttributes class stores avro custom attributes.
 // Each attribute is represented by a unique name and value.
 // User is supposed to create CustomAttributes object and then add it to 
Schema.
 class AVRO_DECL CustomAttributes {
+
 public:
-    // Retrieves the custom attribute json entity for that attributeName, 
returns an
-    // null if the attribute doesn't exist.
+    enum ValueMode {
+        // When a CustomAttributes is created using this mode, all values are 
strings.
+        // The value should not be quoted, but any interior quotes and special
+        // characters (such as newlines) must be escaped.
+        string,
+        // When a CustomAttributes is created using this mode, all values are 
JSON
+        // values. String values must be quoted and escaped.
+        json
+    };

Review Comment:
   ```suggestion
       enum class ValueMode : uint8_t {
           // When a CustomAttributes is created using this mode, all values 
are expected to be string.
           // The value should not be quoted, but any interior quotes and 
special
           // characters (such as newlines) must be escaped.
           STRING,
           // When a CustomAttributes is created using this mode, all values 
are standard JSON
           // values. String values must be quoted and escaped.
           JSON
       };
   ```
   
   I found that the styles of enum are not consistent. Personally I prefer 
upper-case.



##########
lang/c++/impl/json/JsonDom.hh:
##########
@@ -177,6 +177,12 @@ AVRO_DECL Entity loadEntity(const uint8_t *text, size_t 
len);
 
 void writeEntity(JsonGenerator<JsonNullFormatter> &g, const Entity &n);
 
+class AVRO_DECL TooManyValuesException : public virtual std::runtime_error {

Review Comment:
   I don't see any other exception is defined except the one in the 
`Exception.hh`. Should we simply stick to `Exception` to be consistent?



##########
lang/c++/test/unittest.cc:
##########
@@ -489,12 +491,26 @@ struct TestSchema {
                        expectedJsonWithoutCustomAttribute);
     }
 
-    void checkCustomAttributes_getAttribute() {
-        CustomAttributes cf;
-        cf.addAttribute("field1", std::string("1"));
+    void checkCustomAttributes_addAndGetAttributeJson() {

Review Comment:
   IMHO these cases are not sufficient. We need to cover the expected behavior 
of the following cases (even exceptions are expected in some of them):
   
   |Read \ Write | string | json |
   |-----|-----| -----| 
   | string | ? | ? |
   | json | ? | ? |
   
   
   



##########
lang/c++/include/avro/CustomAttributes.hh:
##########
@@ -19,27 +19,76 @@
 #ifndef avro_CustomAttributes_hh__
 #define avro_CustomAttributes_hh__
 
-#include "Config.hh"
 #include <iostream>
 #include <map>
 #include <optional>
 #include <string>
 
+#include "Config.hh"
+
 namespace avro {
 
 // CustomAttributes class stores avro custom attributes.
 // Each attribute is represented by a unique name and value.
 // User is supposed to create CustomAttributes object and then add it to 
Schema.
 class AVRO_DECL CustomAttributes {
+
 public:
-    // Retrieves the custom attribute json entity for that attributeName, 
returns an
-    // null if the attribute doesn't exist.
+    enum ValueMode {
+        // When a CustomAttributes is created using this mode, all values are 
strings.
+        // The value should not be quoted, but any interior quotes and special
+        // characters (such as newlines) must be escaped.
+        string,
+        // When a CustomAttributes is created using this mode, all values are 
JSON
+        // values. String values must be quoted and escaped.
+        json
+    };
+
+    // Creates a new CustomAttributes object where all values are strings.
+    // All values passed to addAttribute() and returned from getAttribute() or 
the
+    // attributes() map will not be enclosed in quotes. However, any internal 
quotes
+    // WILL be escaped and other special characters MAY be escaped.
+    //
+    // To support non-string values, use 
CustomAttributes(CustomAttributes::json) instead.
+    CustomAttributes() : CustomAttributes(string) {}

Review Comment:
   I know the default value is to keep backward-compatibility. However, we have 
to swap the default value in the future because the current behavior is wrong 
for non-string values. (I'm not well-versed in this codebase yet) Is it 
possible to figure out if the avro file is created by avro-cpp as well as its 
writer version? If yes, perhaps we can choose the correct default `ValueMode` 
when deserializing the CustomAttributes based on its writer version and use 
`ValueMode::string` by default for the writer. In this way, we won't create 
more problematic files after this release.



##########
lang/c++/impl/Compiler.cc:
##########
@@ -277,7 +277,7 @@ static void getCustomAttributes(const Object &m, 
CustomAttributes &customAttribu
     const std::unordered_set<std::string> &kKnownFields = getKnownFields();
     for (const auto &entry : m) {
         if (kKnownFields.find(entry.first) == kKnownFields.end()) {
-            customAttributes.addAttribute(entry.first, 
entry.second.stringValue());
+            customAttributes.addAttribute(entry.first, 
entry.second.toString());

Review Comment:
   Why do we change this? Is it a breaking change to downstream users who are 
calling `stringValue`?



##########
lang/c++/include/avro/CustomAttributes.hh:
##########
@@ -19,27 +19,76 @@
 #ifndef avro_CustomAttributes_hh__
 #define avro_CustomAttributes_hh__
 
-#include "Config.hh"
 #include <iostream>
 #include <map>
 #include <optional>
 #include <string>
 
+#include "Config.hh"
+
 namespace avro {
 
 // CustomAttributes class stores avro custom attributes.
 // Each attribute is represented by a unique name and value.
 // User is supposed to create CustomAttributes object and then add it to 
Schema.
 class AVRO_DECL CustomAttributes {
+
 public:
-    // Retrieves the custom attribute json entity for that attributeName, 
returns an
-    // null if the attribute doesn't exist.
+    enum ValueMode {
+        // When a CustomAttributes is created using this mode, all values are 
strings.
+        // The value should not be quoted, but any interior quotes and special
+        // characters (such as newlines) must be escaped.
+        string,
+        // When a CustomAttributes is created using this mode, all values are 
JSON
+        // values. String values must be quoted and escaped.
+        json
+    };

Review Comment:
   As the docstring is user-facing, is it better to separate the expected 
behavior of write and read path? In a more complicated case, what is the 
expected behavior if `ValueMode`s are different when reading and writing the 
same `CustomAttributes`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@avro.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to