I agree with adding a table capability for this. This is something that we
support in our Spark branch so that users can evolve tables without
breaking existing ETL jobs -- when you add an optional column, it shouldn't
fail the existing pipeline writing data to a table. I can contribute the
changes to validation if people are interested.

On Wed, May 13, 2020 at 2:57 PM Russell Spitzer <russell.spit...@gmail.com>
wrote:

> In DSV1 this was pretty easy to do because of the burden of verification
> for writes had to be in the datasource, the new setup makes partial writes
> difficult.
>
> resolveOuptutColumns checks the table schema against the writeplan's
> output and will fail any requests which don't contain every column as
> specified in the table schema.
> I would like it if instead if either we made this check optional for a
> datasource, perhaps an "allow partial writes" trait for the table? Or just
> allowed analysis
> to fail on "withInputDataSchema" where an implementer could throw
> exceptions on underspecified writes.
>
>
> The use case here is that C* (and many other sinks) have mandated columns
> that must be present during an insert as well as those
> which are not required.
>
> Please let me know if i've misread this,
>
> Thanks for your time again,
> Russ
>


-- 
Ryan Blue
Software Engineer
Netflix

Reply via email to