jhump commented on code in PR #3266: URL: https://github.com/apache/avro/pull/3266#discussion_r1927154843
########## lang/c++/include/avro/CustomAttributes.hh: ########## @@ -19,27 +19,76 @@ #ifndef avro_CustomAttributes_hh__ #define avro_CustomAttributes_hh__ -#include "Config.hh" #include <iostream> #include <map> #include <optional> #include <string> +#include "Config.hh" + namespace avro { // CustomAttributes class stores avro custom attributes. // Each attribute is represented by a unique name and value. // User is supposed to create CustomAttributes object and then add it to Schema. class AVRO_DECL CustomAttributes { + public: - // Retrieves the custom attribute json entity for that attributeName, returns an - // null if the attribute doesn't exist. + enum ValueMode { + // When a CustomAttributes is created using this mode, all values are strings. + // The value should not be quoted, but any interior quotes and special + // characters (such as newlines) must be escaped. + string, + // When a CustomAttributes is created using this mode, all values are JSON + // values. String values must be quoted and escaped. + json + }; Review Comment: > what is the expected behavior if `ValueMode`s are different when reading and writing the same `CustomAttributes`? This isn't possible. The `ValueMode` is a property of the `CustomAttributes`, specified at construction time. ########## lang/c++/include/avro/CustomAttributes.hh: ########## @@ -19,27 +19,76 @@ #ifndef avro_CustomAttributes_hh__ #define avro_CustomAttributes_hh__ -#include "Config.hh" #include <iostream> #include <map> #include <optional> #include <string> +#include "Config.hh" + namespace avro { // CustomAttributes class stores avro custom attributes. // Each attribute is represented by a unique name and value. // User is supposed to create CustomAttributes object and then add it to Schema. class AVRO_DECL CustomAttributes { + public: - // Retrieves the custom attribute json entity for that attributeName, returns an - // null if the attribute doesn't exist. + enum ValueMode { + // When a CustomAttributes is created using this mode, all values are strings. + // The value should not be quoted, but any interior quotes and special + // characters (such as newlines) must be escaped. + string, + // When a CustomAttributes is created using this mode, all values are JSON + // values. String values must be quoted and escaped. + json + }; + + // Creates a new CustomAttributes object where all values are strings. + // All values passed to addAttribute() and returned from getAttribute() or the + // attributes() map will not be enclosed in quotes. However, any internal quotes + // WILL be escaped and other special characters MAY be escaped. + // + // To support non-string values, use CustomAttributes(CustomAttributes::json) instead. + CustomAttributes() : CustomAttributes(string) {} Review Comment: Yes, that's a very good idea. We might even want to deprecate the `ValueMode::string` option, too, because it's highly error-prone and is only present for backwards-compatibility with the old string-only behavior. ########## lang/c++/include/avro/CustomAttributes.hh: ########## @@ -19,27 +19,76 @@ #ifndef avro_CustomAttributes_hh__ #define avro_CustomAttributes_hh__ -#include "Config.hh" #include <iostream> #include <map> #include <optional> #include <string> +#include "Config.hh" + namespace avro { // CustomAttributes class stores avro custom attributes. // Each attribute is represented by a unique name and value. // User is supposed to create CustomAttributes object and then add it to Schema. class AVRO_DECL CustomAttributes { + public: - // Retrieves the custom attribute json entity for that attributeName, returns an - // null if the attribute doesn't exist. + enum ValueMode { + // When a CustomAttributes is created using this mode, all values are strings. + // The value should not be quoted, but any interior quotes and special + // characters (such as newlines) must be escaped. + string, + // When a CustomAttributes is created using this mode, all values are JSON + // values. String values must be quoted and escaped. + json + }; + + // Creates a new CustomAttributes object where all values are strings. + // All values passed to addAttribute() and returned from getAttribute() or the + // attributes() map will not be enclosed in quotes. However, any internal quotes + // WILL be escaped and other special characters MAY be escaped. + // + // To support non-string values, use CustomAttributes(CustomAttributes::json) instead. + CustomAttributes() : CustomAttributes(string) {} Review Comment: We always use `ValueMode::json` when we read an Avro file -- it doesn't matter how the file was created because the file always contains valid JSON in the schema. So if it was written with an older version, then it will only ever have string values in custom attributes. But that is fine to later load and read with `ValueMode::json` since that handles _any_ kind of attribute value. This is here for backwards compatibility of _users_ of this API since this is a public type defined in a public header. So even though internal usages of `CustomAttributes` always should use `ValueMode::json`, that's not necessarily the case for external usages. ########## lang/c++/impl/json/JsonDom.cc: ########## @@ -92,7 +92,11 @@ Entity loadEntity(const char *text) { Entity loadEntity(InputStream &in) { JsonParser p; p.init(in); - return readEntity(p); + Entity e = readEntity(p); + if (p.hasMore()) { Review Comment: Just from the names and signatures, it looked like `readValue` could potentially be used to read multiple JSON entities from a stream. In that case, the caller has a reference to the `JsonParser` and might re-use it for subsequent data in the input. But for this method, the caller can't safely do anything with the input after it's called, so seemed more intuitive that it should enforce that the entire stream is consumed. None of these have doc comments, so it wasn't clear the intent everywhere, so I left things flexible and only changed this function, which appears safe to change without breaking any caller assumptions. ########## lang/c++/impl/Compiler.cc: ########## @@ -277,7 +277,7 @@ static void getCustomAttributes(const Object &m, CustomAttributes &customAttribu const std::unordered_set<std::string> &kKnownFields = getKnownFields(); for (const auto &entry : m) { if (kKnownFields.find(entry.first) == kKnownFields.end()) { - customAttributes.addAttribute(entry.first, entry.second.stringValue()); + customAttributes.addAttribute(entry.first, entry.second.toString()); Review Comment: No, this is a method on `json::Entity` which I did not touch. This was the source of the exception: if there was a non-string value for a custom attribute, the `stringValue` method would throw an exception. But now that we support all kinds of values, we use its `toString` method, which will return properly formatted JSON string to represent whatever is the `Entity`'s value. ########## lang/c++/impl/json/JsonDom.hh: ########## @@ -177,6 +177,12 @@ AVRO_DECL Entity loadEntity(const uint8_t *text, size_t len); void writeEntity(JsonGenerator<JsonNullFormatter> &g, const Entity &n); +class AVRO_DECL TooManyValuesException : public virtual std::runtime_error { Review Comment: Well, we catch this type of exception. If we just threw `Exception`, would the catch code be expected to parse the string message to assess the specific error condition? That seems like an anti-pattern. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@avro.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org