jjaakola-aiven commented on code in PR #1759:
URL: https://github.com/apache/avro/pull/1759#discussion_r1003412161
##########
lang/py/avro/test/test_schema.py:
##########
@@ -713,6 +713,22 @@ def test_parse_invalid_symbol(self):
except avro.errors.InvalidName: # pragma: no coverage
self.fail("When enum symbol validation is disabled, an invalid
symbol should not raise InvalidName.")
+ def test_parse_duplicate_symbol(self):
+ duplicate_symbol_schema = json.dumps({"type": "enum", "name":
"AVRO3573", "symbols": ["A", "A", "B", "C", "D"]})
+ with self.assertRaisesRegex(
+ avro.errors.AvroException, r"Duplicate symbol: \[\'A\'\]",
msg="When enum symbol has a duplicate, AvroException should raise."
+ ):
+ avro.schema.parse(duplicate_symbol_schema)
+
+ def test_parse_duplicate_symbols(self):
+ duplicate_symbols_schema = json.dumps({"type": "enum", "name":
"AVRO3573", "symbols": ["A", "A", "B", "C", "C", "D"]})
+ with self.assertRaisesRegex(
+ avro.errors.AvroException,
+ r"Duplicate symbols: (\[\'A\', \'C\'\])|(\[\'C\', \'A\'\])",
Review Comment:
Is there a case where the resulting order in error message would be
backwards?
If not I'd prefer for readability that error message and assertion would be
exact. E.g.
```python
with self.assertRaises(avro.errors.AvroException, msg="Should raise
AvroException") as exc_info:
avro.schema.parse(duplicate_symbols_schema)
assert str(exc_info.exception) == "Duplicate symbols: (['A', 'C'])"
```
##########
lang/py/avro/schema.py:
##########
@@ -570,7 +570,12 @@ def __init__(
raise avro.errors.InvalidName("An enum symbol must be a
valid schema name.")
if len(set(symbols)) < len(symbols):
- raise avro.errors.AvroException(f"Duplicate symbol: {symbols}")
+ duplicate_symbols = {symbol for symbol in symbols if
symbols.count(symbol) > 1}
+
+ if len(duplicate_symbols) == 1:
+ raise avro.errors.AvroException(f"Duplicate symbol:
{list(duplicate_symbols)}")
+ else:
+ raise avro.errors.AvroException(f"Duplicate symbols:
{list(duplicate_symbols)}")
Review Comment:
Java gives an error message of `Duplicate enum symbol: A` and reports only
single duplicate. I am not sure if the messages should be aligned, just noting
the difference.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]