tusiCHN opened a new issue, #8321:
URL: https://github.com/apache/seatunnel/issues/8321

   ### Search before asking
   
   - [X] I had searched in the 
[feature](https://github.com/apache/seatunnel/issues?q=is%3Aissue+label%3A%22Feature%22)
 and found no similar feature requirement.
   
   
   ### Description
   
   
脏数据过滤功能。比如一共有100条数据处理,只有一条脏数据,脏数据比例为1%,或者1小时100条数据,也只有一条脏数据。这种情况可以先保证正常数据进去,脏数据抛出异常,或者告警。
   
   希望可以通过总数据量或者时间范围两个维度去设置允许脏数据比例,或者总脏数据数量。
   
   Dirty data filtering function. For example, there are 100 data processing, 
only one dirty data, the proportion of dirty data is 1%, or 100 data an hour, 
only one dirty data. In this case, you can ensure that normal data is imported, 
and dirty data is generated abnormally, or an alarm is generated.
   
   It is hoped that the proportion of allowed dirty data or the total number of 
dirty data can be set by the total data amount or time range.
   
   ### Usage Scenario
   
   在为实时业务提供数据支持时,因为一条脏数据,导致任务失败。且长时间未发现任务已经停止。
   
   When providing data support for real-time services, a dirty piece of data 
caused a task failure. Procedure The task has not been stopped for a long time.
   
   
   ### Related issues
   
   是的
   
   yes
   
   ### Are you willing to submit a PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to