Hi David, Flink currently has not a compelling story when it comes to error handling and I'd like to change that.
I'd advocate for an approach that naturally translates into dead letter queues as known from other stream processors such as KStreams. [1] With your idea of metadata columns, you are forced to forgo all NOT NULL constraints in the schema because if you have issues while fetching or deserializing a record, you cannot have meaningful values in those fields. Instead of that, having side-outputs will allow us to retain all the constraints. Indeed, the side-outputs of DataStream API are a good building block for error handling. However, we don't have them directly in sources and sinks. My first action would be to amend that. Then, we should make side-outputs accessible to connectors, so that they can signal errors on the records. For example, we could extend the RecordEmitter of the SourceReaderBase framework to allow outputting errors. Together with a user option on what to do on these errors (fail, side-output, ignore), we would already allow DataStream users to reroute those errors to dead letter queues. Finally, we need to find a good SQL abstraction. Paimon has this nice concept of system tables [2]. We could use that to define error tables that can be used for error handling. EXECUTE STATEMENT SET BEGIN INSERT INTO output SELECT * FROM input; INSERT INTO input_errors SELECT * FROM input$errors; INSERT INTO output_errors SELECT * FROM output$errors; END; During plan translation we would map input$errors to the side-output of input. This abstraction allows you to filter for specific errors and reroute them to different error logs (e.g. urgent and non-urgent errors). So to summarize, I'd aim for 3 FLIPs 1. Add side-outputs to source/sink. 2. Add abstraction to connectors to output errors. Provide an option for users to choose behavior on error. User's may use DLQs on datastream API. 3. Add support for system tables for source/sink errors in SQL. User's may use DLQs on SQL/Table API. WDYT? Arvid [1] https://cwiki.apache.org/confluence/display/KAFKA/KIP-1034%3A+Dead+letter+queue+in+Kafka+Streams [2] https://paimon.apache.org/docs/1.1/concepts/system-tables/ On Fri, Jun 13, 2025 at 5:07 PM David Radley <david_rad...@uk.ibm.com> wrote: > Hi, > We are looking at enhancing the getindata connector to add support for a > metadata column, that could surface errors from the connector to the SQL > level, so the SQL could then act on this error information. See Getindata > Flink HTTP connector issue [1]. > > For the HTTP Connector we are thinking of: > > create table api_lookup ( > `customer_id` STRING NOT NULL, > `customer_records_from_backend` STRING, > `errorMessage` STRING METADATA FROM `error_string`, > `responseCode` INT METADATA FROM `error_code`) > WITH ( > `connector` = 'rest_lookup', > ... > ) > > The subsequent lookup join getting nulls for the values in this error > case, but the metadata columns would contain error information that would > allow the stream to handle the errors. > > This sort of functionality allows an enterprise to handle errors more > autonomously/ naturally in the flow – rather than jobs failing and logs > needing to be analysed. > > I was thinking that this approach could be useful for JDBC – surfacing he > JDBC errors. > > Also for the Kafka connector we have side outputs[2] for datastream. I > wonder if the Kafka connector could surface badly formed events via > metadata columns – allowing for the flow itself to manage errors in SQL > > I do wonder whether the errorMessage and responseCode could be baked into > Flink as the way to pass this information into the stream. For now, we will > implement in the HTTP connector. > > What do people think of this idea to enhance SQL to be “error aware”? > Metadata columns seem a good way to do this, Flink SQL support for try > catch could be another approach. Has SQL try catch in Flink ever been > looked into? > > Kind regards, David. > > > > > [1] https://github.com/getindata/flink-http-connector/issues/154 > > [2] > https://nightlies.apache.org/flink/flink-docs-release-2.0/docs/dev/datastream/side_output/ > > Unless otherwise stated above: > > IBM United Kingdom Limited > Registered in England and Wales with number 741598 > Registered office: Building C, IBM Hursley Office, Hursley Park Road, > Winchester, Hampshire SO21 2JN >