Hi all, I recently had an experience where a streaming pipeline became "clogged" due to invalid data reaching the final step in my pipeline such that the data was causing a non-transient error when writing to my Sink. Since the job is a streaming job, the element (bundle) was continuously retrying.
What options are there for getting out of this state when it occurs? I attempted to add validation and update the streaming job to remove the bad entity; though the update was successful, I believe the bad entity was already checkpointed (?) further downstream in the pipeline. What then? And for something like a database schema and evolving it over time, what is the typical solution? - Should pipelines mirror a DB schema and do validation of all data types in the pipeline? - Should all sinks implement a way to remove non-transient failures from retrying and output them via PCollectionTuple (such as with BigQuery failed inserts)?
