KitFieldhouse commented on issue #383:
URL: https://github.com/apache/avro-rs/issues/383#issuecomment-3743361391
Thanks for the discussion. I apologize for not responding earlier, I've been
out on vacation.
> @KitFieldhouse Would that make something possible that is currently
impossible/broken?
What I had in mind is exactly what Kriskras demonstrates in their above post.
As for storing name in the value, as Kriskras mentions:
> We can also store the name in the value, but that would cause a lot of
extra string clones.
I agree that we want to avoid making a bunch of string clones. For the fork
I have been working on, copying names is not too much of an issue, as names are
(usually) passed via `Arc` references, so for values created by reading a
binary stream with a schema, we can just store this reference. For user created
values, the user supplied string literal only needs to live long enough for us
to verify the value against a schema, at which point we can again use the
reference.
I do think optionally accepting names in value makes sense. For instance,
for the schema
```
[{"type": "record", "name": "A", "fields": []},{"type": "record", "name":
"B", "fields": []}]
```
With the current state of `Value`, to specify our record as being record
"B", we would need to construct our value like so
```
Value::Union(1,Box::new(Value::Record(vec![])));
```
which, in my opinion is a roundabout way of saying what we really want to
say which is that we are creating a record with name "B". I think this can lead
to a footgun where if a user changes the order of the union in the schema, the
"name" of the record may change or we may fail to validate all together.
> I think this is best solved by changing Value::resolve(self, schema:
&Schema) -> Value to Value::resolve(self, reader_schema: &Schema,
writer_schema: &Schema) -> Value
A few thoughts,
- Would we check `Value` against `writer_schema` first and then, if this
passes, perform our resolution against
`reader_schema`?
- What do we do when the specific type written in `Value` disagrees with
what is specified in `writer_schema`?
Should we reject this case, or coerce according to what makes sense? For
instance, lets say we have a `writer_schema` of `{"type": "long"}`, a
`reader_schema` of `{"type": "double"}` and a value of `Value::Int(123)`?
Should we reject this since `Value` does not agree with `writer_schema` even
though there is a clear coercion here? The only situation I can think of where
this disagreement between writer schema and value would occur is if the
provided value was constructed by the user and not read from a binary stream.
Having `TypedValue` represent a type-level promise that its wrapped value
and schema have been checked against each other and any necessary coercions
have been performed coupled with appropriately limiting the ways in which a
user can create a `TypedValue` provides a way of eliminating the situation
where we provide a `Value` and a `writer_schema` that are not mutually
compatible. Then, if we write functions that take `TypedValue` instead of
`Value` we can assume the fast path that no checking between `writer_schema`
and `Value` is needed.
> The fact that resolving is more lenient than allowed by the specification
is also a valid concern, but is not something I think needs to be fixed.
In so far as this lenience doesn't cause incorrect behavior (like the
example in your post) I agree with this argument to a degree. To play devil's
advocate here, I see an argument that since resolution is unambiguously defined
by the specification, it should be expected that implementations follow those
rules and only those rules to ensure consistent behavior between
implementations. I could see someone being confused on why this crate can
resolve between a reader and writer schema where the Java implementation can
not. Though, maybe this is fixable via a note in the documentation.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]