jhump commented on code in PR #3266:
URL: https://github.com/apache/avro/pull/3266#discussion_r1927154843


##########
lang/c++/include/avro/CustomAttributes.hh:
##########
@@ -19,27 +19,76 @@
 #ifndef avro_CustomAttributes_hh__
 #define avro_CustomAttributes_hh__
 
-#include "Config.hh"
 #include <iostream>
 #include <map>
 #include <optional>
 #include <string>
 
+#include "Config.hh"
+
 namespace avro {
 
 // CustomAttributes class stores avro custom attributes.
 // Each attribute is represented by a unique name and value.
 // User is supposed to create CustomAttributes object and then add it to 
Schema.
 class AVRO_DECL CustomAttributes {
+
 public:
-    // Retrieves the custom attribute json entity for that attributeName, 
returns an
-    // null if the attribute doesn't exist.
+    enum ValueMode {
+        // When a CustomAttributes is created using this mode, all values are 
strings.
+        // The value should not be quoted, but any interior quotes and special
+        // characters (such as newlines) must be escaped.
+        string,
+        // When a CustomAttributes is created using this mode, all values are 
JSON
+        // values. String values must be quoted and escaped.
+        json
+    };

Review Comment:
   > what is the expected behavior if `ValueMode`s are different when reading 
and writing the same `CustomAttributes`?
   
   This isn't possible. The `ValueMode` is a property of the 
`CustomAttributes`, specified at construction time.



##########
lang/c++/include/avro/CustomAttributes.hh:
##########
@@ -19,27 +19,76 @@
 #ifndef avro_CustomAttributes_hh__
 #define avro_CustomAttributes_hh__
 
-#include "Config.hh"
 #include <iostream>
 #include <map>
 #include <optional>
 #include <string>
 
+#include "Config.hh"
+
 namespace avro {
 
 // CustomAttributes class stores avro custom attributes.
 // Each attribute is represented by a unique name and value.
 // User is supposed to create CustomAttributes object and then add it to 
Schema.
 class AVRO_DECL CustomAttributes {
+
 public:
-    // Retrieves the custom attribute json entity for that attributeName, 
returns an
-    // null if the attribute doesn't exist.
+    enum ValueMode {
+        // When a CustomAttributes is created using this mode, all values are 
strings.
+        // The value should not be quoted, but any interior quotes and special
+        // characters (such as newlines) must be escaped.
+        string,
+        // When a CustomAttributes is created using this mode, all values are 
JSON
+        // values. String values must be quoted and escaped.
+        json
+    };
+
+    // Creates a new CustomAttributes object where all values are strings.
+    // All values passed to addAttribute() and returned from getAttribute() or 
the
+    // attributes() map will not be enclosed in quotes. However, any internal 
quotes
+    // WILL be escaped and other special characters MAY be escaped.
+    //
+    // To support non-string values, use 
CustomAttributes(CustomAttributes::json) instead.
+    CustomAttributes() : CustomAttributes(string) {}

Review Comment:
   Yes, that's a very good idea. We might even want to deprecate the 
`ValueMode::string` option, too, because it's highly error-prone and is only 
present for backwards-compatibility with the old string-only behavior.



##########
lang/c++/include/avro/CustomAttributes.hh:
##########
@@ -19,27 +19,76 @@
 #ifndef avro_CustomAttributes_hh__
 #define avro_CustomAttributes_hh__
 
-#include "Config.hh"
 #include <iostream>
 #include <map>
 #include <optional>
 #include <string>
 
+#include "Config.hh"
+
 namespace avro {
 
 // CustomAttributes class stores avro custom attributes.
 // Each attribute is represented by a unique name and value.
 // User is supposed to create CustomAttributes object and then add it to 
Schema.
 class AVRO_DECL CustomAttributes {
+
 public:
-    // Retrieves the custom attribute json entity for that attributeName, 
returns an
-    // null if the attribute doesn't exist.
+    enum ValueMode {
+        // When a CustomAttributes is created using this mode, all values are 
strings.
+        // The value should not be quoted, but any interior quotes and special
+        // characters (such as newlines) must be escaped.
+        string,
+        // When a CustomAttributes is created using this mode, all values are 
JSON
+        // values. String values must be quoted and escaped.
+        json
+    };
+
+    // Creates a new CustomAttributes object where all values are strings.
+    // All values passed to addAttribute() and returned from getAttribute() or 
the
+    // attributes() map will not be enclosed in quotes. However, any internal 
quotes
+    // WILL be escaped and other special characters MAY be escaped.
+    //
+    // To support non-string values, use 
CustomAttributes(CustomAttributes::json) instead.
+    CustomAttributes() : CustomAttributes(string) {}

Review Comment:
   We always use `ValueMode::json` when we read an Avro file -- it doesn't 
matter how the file was created because the file always contains valid JSON in 
the schema. So if it was written with an older version, then it will only ever 
have string values in custom attributes. But that is fine to later load and 
read with `ValueMode::json` since that handles _any_ kind of attribute value.
   
   This is here for backwards compatibility of _users_ of this API since this 
is a public type defined in a public header. So even though internal usages of 
`CustomAttributes` always should use `ValueMode::json`, that's not necessarily 
the case for external usages.



##########
lang/c++/impl/json/JsonDom.cc:
##########
@@ -92,7 +92,11 @@ Entity loadEntity(const char *text) {
 Entity loadEntity(InputStream &in) {
     JsonParser p;
     p.init(in);
-    return readEntity(p);
+    Entity e = readEntity(p);
+    if (p.hasMore()) {

Review Comment:
   Just from the names and signatures, it looked like `readValue` could 
potentially be used to read multiple JSON entities from a stream. In that case, 
the caller has a reference to the `JsonParser` and might re-use it for 
subsequent data in the input.
   
   But for this method, the caller can't safely do anything with the input 
after it's called, so seemed more intuitive that it should enforce that the 
entire stream is consumed.
   
   None of these have doc comments, so it wasn't clear the intent everywhere, 
so I left things flexible and only changed this function, which appears safe to 
change without breaking any caller assumptions.



##########
lang/c++/impl/Compiler.cc:
##########
@@ -277,7 +277,7 @@ static void getCustomAttributes(const Object &m, 
CustomAttributes &customAttribu
     const std::unordered_set<std::string> &kKnownFields = getKnownFields();
     for (const auto &entry : m) {
         if (kKnownFields.find(entry.first) == kKnownFields.end()) {
-            customAttributes.addAttribute(entry.first, 
entry.second.stringValue());
+            customAttributes.addAttribute(entry.first, 
entry.second.toString());

Review Comment:
   No, this is a method on `json::Entity` which I did not touch. This was the 
source of the exception: if there was a non-string value for a custom 
attribute, the `stringValue` method would throw an exception. But now that we 
support all kinds of values, we use its `toString` method, which will return 
properly formatted JSON string to represent whatever is the `Entity`'s value.



##########
lang/c++/impl/json/JsonDom.hh:
##########
@@ -177,6 +177,12 @@ AVRO_DECL Entity loadEntity(const uint8_t *text, size_t 
len);
 
 void writeEntity(JsonGenerator<JsonNullFormatter> &g, const Entity &n);
 
+class AVRO_DECL TooManyValuesException : public virtual std::runtime_error {

Review Comment:
   Well, we catch this type of exception. If we just threw `Exception`, would 
the catch code be expected to parse the string message to assess the specific 
error condition? That seems like an anti-pattern.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@avro.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to