wgtmac commented on code in PR #3266: URL: https://github.com/apache/avro/pull/3266#discussion_r1914203397
########## lang/c++/include/avro/CustomAttributes.hh: ########## @@ -19,27 +19,76 @@ #ifndef avro_CustomAttributes_hh__ #define avro_CustomAttributes_hh__ -#include "Config.hh" #include <iostream> Review Comment: nit: is it possible to remove this line? It looks weird that a public header exposes iostream. ########## lang/c++/include/avro/CustomAttributes.hh: ########## @@ -19,27 +19,76 @@ #ifndef avro_CustomAttributes_hh__ #define avro_CustomAttributes_hh__ -#include "Config.hh" #include <iostream> #include <map> #include <optional> #include <string> +#include "Config.hh" + namespace avro { // CustomAttributes class stores avro custom attributes. // Each attribute is represented by a unique name and value. // User is supposed to create CustomAttributes object and then add it to Schema. class AVRO_DECL CustomAttributes { + public: - // Retrieves the custom attribute json entity for that attributeName, returns an - // null if the attribute doesn't exist. + enum ValueMode { + // When a CustomAttributes is created using this mode, all values are strings. + // The value should not be quoted, but any interior quotes and special + // characters (such as newlines) must be escaped. + string, + // When a CustomAttributes is created using this mode, all values are JSON + // values. String values must be quoted and escaped. + json + }; + + // Creates a new CustomAttributes object where all values are strings. + // All values passed to addAttribute() and returned from getAttribute() or the + // attributes() map will not be enclosed in quotes. However, any internal quotes + // WILL be escaped and other special characters MAY be escaped. + // + // To support non-string values, use CustomAttributes(CustomAttributes::json) instead. + CustomAttributes() : CustomAttributes(string) {} + + // Creates a new CustomAttributes object. + // + // If the given mode is string, all values must be strings. All values passed to Review Comment: If the `ValueMode` enum has good documentation, we don't have to repeat them here. Otherwise, it is easy to be out of sync. ########## lang/c++/include/avro/CustomAttributes.hh: ########## @@ -19,27 +19,76 @@ #ifndef avro_CustomAttributes_hh__ #define avro_CustomAttributes_hh__ -#include "Config.hh" #include <iostream> #include <map> #include <optional> #include <string> +#include "Config.hh" + namespace avro { // CustomAttributes class stores avro custom attributes. // Each attribute is represented by a unique name and value. // User is supposed to create CustomAttributes object and then add it to Schema. class AVRO_DECL CustomAttributes { + public: - // Retrieves the custom attribute json entity for that attributeName, returns an - // null if the attribute doesn't exist. + enum ValueMode { + // When a CustomAttributes is created using this mode, all values are strings. + // The value should not be quoted, but any interior quotes and special + // characters (such as newlines) must be escaped. + string, + // When a CustomAttributes is created using this mode, all values are JSON + // values. String values must be quoted and escaped. + json + }; + + // Creates a new CustomAttributes object where all values are strings. + // All values passed to addAttribute() and returned from getAttribute() or the + // attributes() map will not be enclosed in quotes. However, any internal quotes + // WILL be escaped and other special characters MAY be escaped. + // + // To support non-string values, use CustomAttributes(CustomAttributes::json) instead. + CustomAttributes() : CustomAttributes(string) {} Review Comment: Should we mark it as `[[deprecated]]` because we want discourage users to use this one? ########## lang/c++/impl/json/JsonDom.cc: ########## @@ -92,7 +92,11 @@ Entity loadEntity(const char *text) { Entity loadEntity(InputStream &in) { JsonParser p; p.init(in); - return readEntity(p); + Entity e = readEntity(p); + if (p.hasMore()) { Review Comment: Why not checking it in the `readEntity` function? ########## lang/c++/include/avro/CustomAttributes.hh: ########## @@ -19,27 +19,76 @@ #ifndef avro_CustomAttributes_hh__ #define avro_CustomAttributes_hh__ -#include "Config.hh" #include <iostream> #include <map> #include <optional> #include <string> +#include "Config.hh" + namespace avro { // CustomAttributes class stores avro custom attributes. // Each attribute is represented by a unique name and value. // User is supposed to create CustomAttributes object and then add it to Schema. class AVRO_DECL CustomAttributes { + public: - // Retrieves the custom attribute json entity for that attributeName, returns an - // null if the attribute doesn't exist. + enum ValueMode { + // When a CustomAttributes is created using this mode, all values are strings. + // The value should not be quoted, but any interior quotes and special + // characters (such as newlines) must be escaped. + string, + // When a CustomAttributes is created using this mode, all values are JSON + // values. String values must be quoted and escaped. + json + }; Review Comment: ```suggestion enum class ValueMode : uint8_t { // When a CustomAttributes is created using this mode, all values are expected to be string. // The value should not be quoted, but any interior quotes and special // characters (such as newlines) must be escaped. STRING, // When a CustomAttributes is created using this mode, all values are standard JSON // values. String values must be quoted and escaped. JSON }; ``` I found that the styles of enum are not consistent. Personally I prefer upper-case. ########## lang/c++/impl/json/JsonDom.hh: ########## @@ -177,6 +177,12 @@ AVRO_DECL Entity loadEntity(const uint8_t *text, size_t len); void writeEntity(JsonGenerator<JsonNullFormatter> &g, const Entity &n); +class AVRO_DECL TooManyValuesException : public virtual std::runtime_error { Review Comment: I don't see any other exception is defined except the one in the `Exception.hh`. Should we simply stick to `Exception` to be consistent? ########## lang/c++/test/unittest.cc: ########## @@ -489,12 +491,26 @@ struct TestSchema { expectedJsonWithoutCustomAttribute); } - void checkCustomAttributes_getAttribute() { - CustomAttributes cf; - cf.addAttribute("field1", std::string("1")); + void checkCustomAttributes_addAndGetAttributeJson() { Review Comment: IMHO these cases are not sufficient. We need to cover the expected behavior of the following cases (even exceptions are expected in some of them): |Read \ Write | string | json | |-----|-----| -----| | string | ? | ? | | json | ? | ? | ########## lang/c++/include/avro/CustomAttributes.hh: ########## @@ -19,27 +19,76 @@ #ifndef avro_CustomAttributes_hh__ #define avro_CustomAttributes_hh__ -#include "Config.hh" #include <iostream> #include <map> #include <optional> #include <string> +#include "Config.hh" + namespace avro { // CustomAttributes class stores avro custom attributes. // Each attribute is represented by a unique name and value. // User is supposed to create CustomAttributes object and then add it to Schema. class AVRO_DECL CustomAttributes { + public: - // Retrieves the custom attribute json entity for that attributeName, returns an - // null if the attribute doesn't exist. + enum ValueMode { + // When a CustomAttributes is created using this mode, all values are strings. + // The value should not be quoted, but any interior quotes and special + // characters (such as newlines) must be escaped. + string, + // When a CustomAttributes is created using this mode, all values are JSON + // values. String values must be quoted and escaped. + json + }; + + // Creates a new CustomAttributes object where all values are strings. + // All values passed to addAttribute() and returned from getAttribute() or the + // attributes() map will not be enclosed in quotes. However, any internal quotes + // WILL be escaped and other special characters MAY be escaped. + // + // To support non-string values, use CustomAttributes(CustomAttributes::json) instead. + CustomAttributes() : CustomAttributes(string) {} Review Comment: I know the default value is to keep backward-compatibility. However, we have to swap the default value in the future because the current behavior is wrong for non-string values. (I'm not well-versed in this codebase yet) Is it possible to figure out if the avro file is created by avro-cpp as well as its writer version? If yes, perhaps we can choose the correct default `ValueMode` when deserializing the CustomAttributes based on its writer version and use `ValueMode::string` by default for the writer. In this way, we won't create more problematic files after this release. ########## lang/c++/impl/Compiler.cc: ########## @@ -277,7 +277,7 @@ static void getCustomAttributes(const Object &m, CustomAttributes &customAttribu const std::unordered_set<std::string> &kKnownFields = getKnownFields(); for (const auto &entry : m) { if (kKnownFields.find(entry.first) == kKnownFields.end()) { - customAttributes.addAttribute(entry.first, entry.second.stringValue()); + customAttributes.addAttribute(entry.first, entry.second.toString()); Review Comment: Why do we change this? Is it a breaking change to downstream users who are calling `stringValue`? ########## lang/c++/include/avro/CustomAttributes.hh: ########## @@ -19,27 +19,76 @@ #ifndef avro_CustomAttributes_hh__ #define avro_CustomAttributes_hh__ -#include "Config.hh" #include <iostream> #include <map> #include <optional> #include <string> +#include "Config.hh" + namespace avro { // CustomAttributes class stores avro custom attributes. // Each attribute is represented by a unique name and value. // User is supposed to create CustomAttributes object and then add it to Schema. class AVRO_DECL CustomAttributes { + public: - // Retrieves the custom attribute json entity for that attributeName, returns an - // null if the attribute doesn't exist. + enum ValueMode { + // When a CustomAttributes is created using this mode, all values are strings. + // The value should not be quoted, but any interior quotes and special + // characters (such as newlines) must be escaped. + string, + // When a CustomAttributes is created using this mode, all values are JSON + // values. String values must be quoted and escaped. + json + }; Review Comment: As the docstring is user-facing, is it better to separate the expected behavior of write and read path? In a more complicated case, what is the expected behavior if `ValueMode`s are different when reading and writing the same `CustomAttributes`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@avro.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org