ocean-zhc commented on issue #9289: URL: https://github.com/apache/seatunnel/issues/9289#issuecomment-2861298020
Problem analysis: 1, to confirm whether the receiving end (HTTP service) supports batch processing JSON array format data 2, through the code analysis, I found that the current HTTP sink implementation (HttpSinkWriter class) really does not have batch processing capabilities: HttpSinkWriter write method directly on each record to send a separate HTTP request no similar batch logic and buffer in other sinks (such as PrometheusWriter, DruidWriter, etc.) 3, although you can set the batch_size parameter in the configuration file, but this parameter is not currently handled in the implementation of HTTP sink, so it can not take effect. Solution: 1, refer to other sink implementations with batch processing capabilities (such as PrometheusWriter, InfluxDBSinkWriter, etc.) 2, in the HttpSinkWriter add a data buffer (such as List<SeaTunnelRow>) 3, modify the write method, the data will be stored in the buffer, when the batch size is reached and then sent 4. Implement the flush method to send the data in the buffer in a batch. 5. Add batch_size configuration item in HttpSinkOptions ---- 问题分析: 1、确认接收端(HTTP服务)是否支持批量处理JSON数组格式的数据 2、通过代码分析,我发现当前的HTTP sink实现(HttpSinkWriter类)确实没有批处理功能: HttpSinkWriter的write方法直接对每条记录单独发送HTTP请求 没有类似其他sink(如PrometheusWriter、DruidWriter等)中的批处理逻辑和缓冲区 3、虽然在配置文件中可以设置batch_size参数,但这个参数目前并未在HTTP sink的实现中得到处理,因此无法生效。 解决方案: 1、参考其他有批处理功能的sink实现(如PrometheusWriter、InfluxDBSinkWriter等) 2、在HttpSinkWriter中添加一个数据缓冲区(如List<SeaTunnelRow>) 3、修改write方法,将数据存入缓冲区,当达到批大小时再发送 4、实现flush方法,将缓冲区中的数据批量发送 5、在HttpSinkOptions中添加batch_size配置项 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
