Hi Phillip,

While not as fine-grained as your example, there do exist schema systems
such as that in Avro that can can evaluate compatible and incompatible
changes to the schema, from the perspective of the reader, writer, or both.
This provides some potential degree of enforcement, and means to
communicate a contract. Interestingly I believe this approach has been
applied to both JsonSchema and protobuf as part of the Confluent Schema
registry.

Elliot.

On Mon, 12 Jun 2023 at 12:43, Phillip Henry <londonjava...@gmail.com> wrote:

> Hi, folks.
>
> There currently seems to be a buzz around "data contracts". From what I
> can tell, these mainly advocate a cultural solution. But instead, could big
> data tools be used to enforce these contracts?
>
> My questions really are: are there any plans to implement data constraints
> in Spark (eg, an integer must be between 0 and 100; the date in column X
> must be before that in column Y)? And if not, is there an appetite for them?
>
> Maybe we could associate constraints with schema metadata that are
> enforced in the implementation of a FileFormatDataWriter?
>
> Just throwing it out there and wondering what other people think. It's an
> area that interests me as it seems that over half my problems at the day
> job are because of dodgy data.
>
> Regards,
>
> Phillip
>
>

Reply via email to