Hi, folks.

There currently seems to be a buzz around "data contracts". From what I can
tell, these mainly advocate a cultural solution. But instead, could big
data tools be used to enforce these contracts?

My questions really are: are there any plans to implement data constraints
in Spark (eg, an integer must be between 0 and 100; the date in column X
must be before that in column Y)? And if not, is there an appetite for them?

Maybe we could associate constraints with schema metadata that are enforced
in the implementation of a FileFormatDataWriter?

Just throwing it out there and wondering what other people think. It's an
area that interests me as it seems that over half my problems at the day
job are because of dodgy data.

Regards,

Phillip

Reply via email to