Hey Phillip,

You're right that we can improve tooling to help with data contracts, but I
think that a contract still needs to be an agreement between people.
Constraints help by helping to ensure a data producer adheres to the
contract and gives feedback as soon as possible when assumptions are
violated. The problem with considering that the only contract is that it's
too easy to change it. For example, if I change a required column to a
nullable column, that's a perfectly valid transition --- but only if I've
communicated that change to downstream consumers.

Ryan

On Mon, Jun 12, 2023 at 4:43 AM Phillip Henry <londonjava...@gmail.com>
wrote:

> Hi, folks.
>
> There currently seems to be a buzz around "data contracts". From what I
> can tell, these mainly advocate a cultural solution. But instead, could big
> data tools be used to enforce these contracts?
>
> My questions really are: are there any plans to implement data constraints
> in Spark (eg, an integer must be between 0 and 100; the date in column X
> must be before that in column Y)? And if not, is there an appetite for them?
>
> Maybe we could associate constraints with schema metadata that are
> enforced in the implementation of a FileFormatDataWriter?
>
> Just throwing it out there and wondering what other people think. It's an
> area that interests me as it seems that over half my problems at the day
> job are because of dodgy data.
>
> Regards,
>
> Phillip
>
>

-- 
Ryan Blue
Tabular

Reply via email to