Why does the BurstFilter not address your concern? Ralph
> On Jan 27, 2025, at 5:59 PM, Yuepeng Pan <panyuep...@apache.org> wrote: > > Thanks Volkan for the quick response. > >> Could you share an example of how your >> filter is used in a configuration file, please? > Yes, glad to do it. > > The specific examples are as follows. > > > > log4j.properties file. > > # config lines placeholders. > > ..... > > # Config for the ProcessLoggerCrossFilter, The window size of limiter is a > constant with 1 min. > > appender.main.filter.yourFilterGroup.type=ProcessLoggerCrossFilter > > # 3000 records/min at process level. > > appender.main.filter.yourFilterGroup.procCountRateLimit=3000 > > # 204800 bytes/min at process level. > > appender.main.filter.yourFilterGroup.procSizeRateLimit=204800 > > # 32 records/min per logger(class to print log line) at logger level. > > appender.main.filter.yourFilterGroup.loggerCountRateLimit=32 > > # 10240 bytes/min per logger(class to print log line) at logger level. > > appender.main.filter.yourFilterGroup.loggerSizeRateLimit=10240 > > > Thanks~ > > > > Best, > Yuepeng. > > > > > At 2025-01-28 03:48:12, "Volkan Yazıcı" <vol...@yazi.ci> wrote: >> It is great to hear that you have already done the biggest part of the >> work: implementing such a filter! Could you share an example of how your >> filter is used in a configuration file, please? >> >> On Mon, Jan 27, 2025 at 2:07 PM Yuepeng Pan <panyuep...@apache.org> wrote: >> >>> Thanks Volkan for the codes and comments. >>> >>> >>> >>> >>>> You can either implement this in a Java/Kotlin/Scala/etc. class >>> >>>> <https://logging.apache.org/log4j/2.x/manual/filters.html#extending> >>> >>>> or a Script >>> >>>> Filter <https://logging.apache.org/log4j/2.x/manual/filters.html#Script >>>> . >>> >>>> Would you mind explaining to us why these are not an option for you but >>> >>>> instead this logic must be provided as an official Log4j component, >>> please? >>> >>> >>> >>> >>> The functionality can be easily implemented based on the reserved filter >>> interface. >>> >>> The design of the logging interface is excellent. >>> >>> >>> >>> >>> I have already implemented a filter that can achieve similar >>> functionality. >>> >>> It is primarily used in large distributed systems like FLINK and Spark. >>> >>> These systems have the following characteristics when generating >>> production logs: >>> >>> >>> >>> >>> - There are many classes, which means there are many logger names; >>> >>> - The log rate is usually high; >>> >>> - User logs and framework logs are often mixed together. >>> >>> >>> >>> >>> Please allow me to explain why I would like to contribute this to >>> >>> the official repository. From my limited reading, the reasons are: >>> >>> >>> >>> >>> - It is quite valuable in the aforementioned frameworks and use cases. >>> >>> - Existing filters only have logger-level rate limiting, whereas this >>> filter does not. >>> >>> Please feel free to correct me if I’m wrong. >>> >>> >>> >>> >>> Thank you very much. >>> >>> >>> >>> >>> Best, >>> Yuepeng >>> >>> >>> >>> >>> >>> At 2025-01-27 17:44:28, "Volkan Yazıcı" <vol...@yazi.ci> wrote: >>>> Hello Yuepeng, >>>> >>>> Thanks so much for reaching out to us. Your use case is indeed an >>>> interesting one and it is good to learn such Log4j deployments in the >>> wild. >>>> >>>> Consider the following Log4j filter pseudo code: >>>> >>>> WeakHashMap<Key, RateLimiter> rateLimiterByKey = >>>> activeLoggerContext.getObject("rateLimiters"); >>>> Key key = Key.fromDimensions(logEvent.getLogger(), ...); >>>> RateLimiter rateLimiter = rateLimiterByKey.putIfAbsent(key, ignored -> >>>> RateLimiter.ofMaxRate(key.maxRate())); >>>> return rateLimiter.acquire() ? Result.ACCEPT : Result.DENY; >>>> >>>> >>>> You can either implement this in a Java/Kotlin/Scala/etc. class >>>> <https://logging.apache.org/log4j/2.x/manual/filters.html#extending> >>>> or a Script >>>> Filter <https://logging.apache.org/log4j/2.x/manual/filters.html#Script>. >>>> Would you mind explaining to us why these are not an option for you but >>>> instead this logic must be provided as an official Log4j component, >>> please? >>>> >>>> Kind regards. >>>> >>>> On Mon, Jan 27, 2025 at 3:55 AM Yuepeng Pan <panyuep...@apache.org> >>> wrote: >>>> >>>>> Sorry, I’m not sure why the formatting of the email appears to be >>> somewhat >>>>> disorganized. Therefore, I have reorganized part of the disordered >>> content >>>>> and added it to doc[1]. >>>>> >>>>> Thank you. >>>>> >>>>> [1] >>>>> >>> https://docs.google.com/document/d/1kVa0V_RrPpT5aa5rfxEaH-QxyXTplQr65xUMZMmDoFA/edit?tab=t.0#heading=h.x6o7d75qh2vl >>>>> >>>>> Best, >>>>> Yuepeng >>>>> >>>>> On 2025/01/27 02:46:28 Yuepeng Pan wrote: >>>>>> Thanks Jay Kataria for the comments. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> 1. Can you give an example of the scenarios where this can be >>> useful. >>>>>> >>>>>>> Adding rate limiters to logs seems like an interesting idea, but >>> just >>>>>> >>>>>>> wondering what is the business motivation. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> 3. I am interested in what you talked about - dimensions and allow >>>>>> >>>>>>> thresholds to be shared across these dimensions or metrics. Could >>> you >>>>> give >>>>>> >>>>>>> an example of this particularly, I just want to know about the real >>>>> world >>>>>> >>>>>>> applications of this. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> Please let me have a try on clarifing it. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> Generally speaking, the logging rate of each logger varies. >>>>>> >>>>>> In some scenarios or under the influence of existing filters, >>>>>> >>>>>> if a particular logger generates logs at an especially high rate, >>>>>> >>>>>> the log output of other loggers might be affected. >>>>>> >>>>>> In short, all loggers compete for the same type of rate-limited >>>>> resources without any proactive intervention logic. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> For example, suppose there are logger1 and logger2, >>>>>> >>>>>> and the user is interested in the log output of logger2. >>>>>> >>>>>> A filter is configured to limit the log rate to 100 records/min. >>>>>> >>>>>> If logger1 produces logs at a rate of 200 records/min, >>>>>> >>>>>> it is highly likely that logger2 will be unable to output any logs >>>>>> >>>>>> because logger1 has already reached the rate-limiting threshold. >>>>>> >>>>>> The user expects that while ensuring the rate-limiting of logs, >>>>>> >>>>>> the target logger should still be able to output the necessary logs. >>>>>> >>>>>> At the least, no logger should be completely blocked from outputting >>>>> logs due to rate-limiting. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> The best solution in this case is to set a shared rate-limiting >>>>> condition for each logger. >>>>>> >>>>>> For example, allow each logger to output 100 records/min. >>>>>> >>>>>> This way, every logger is guaranteed a certain log output rate under >>>>> rate-limiting. >>>>>> >>>>>> When the number of loggers is small, or when the log generation rate >>> of >>>>> the process is relatively low, >>>>>> >>>>>> even if each logger has reached the rate-limiting threshold, some >>> output >>>>> can still be allowed. >>>>>> >>>>>> This refers to the shared rate-limited resources or thresholds among >>> all >>>>> loggers. >>>>>> >>>>>> In this rate limiter, this corresponds to a process-level >>> rate-limiting >>>>> threshold. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> I drafted an example to illustrate how loggers can isolate >>> rate-limited >>>>> resources and compete for shared rate-limited resources. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> - The filter limiter statistics window is 1min. >>>>>> >>>>>> - Filter configs: >>>>>> >>>>>> - process level: 1000 records/min >>>>>> >>>>>> - logger level: 500 records/min >>>>>> >>>>>> - All loggers in the system: logger1, logger2 >>>>>> >>>>>> - Statistics of already generated log records >>>>>> >>>>>> - Process Stats: 998 records >>>>>> >>>>>> - logger1 Stats: 499 records >>>>>> >>>>>> - logger2 Stats: 499 records >>>>>> >>>>>> - The new log records sequences >>>>>> >>>>>> NO_n: record n of logger1 >>>>>> >>>>>> Result: Here's remaining 1 record in the current threshold of logger1 >>>>> (499 to 500), >>>>>> >>>>>> so the record n is allowed to print. >>>>>> >>>>>> Stats change: >>>>>> >>>>>> - Process Stats: 999 records >>>>>> >>>>>> - logger1 Stats: 500 records >>>>>> >>>>>> NO_n+1. record n+1 of logger1 >>>>>> >>>>>> Result: Here's remaining 0 record in the current threshold of logger1 >>>>> (500 to 500). >>>>>> >>>>>> but here's remaining 1 record in the process level threshold (999 to >>>>> 1000). >>>>>> >>>>>> So the record n+1 is allowed to print. >>>>>> >>>>>> Stats change: >>>>>> >>>>>> - Process Stats: 1000 records >>>>>> >>>>>> - logger1 Stats: 501 records >>>>>> >>>>>> NO_n+2. record n+2 of logger1 >>>>>> >>>>>> Result: Here are no remaining records in threshold of logger1 level >>> and >>>>> process level. >>>>>> >>>>>> So the record n+2 is not allowed to print. >>>>>> >>>>>> Stats change: N.A >>>>>> >>>>>> NO_n+3. record n+3 of logger2 >>>>>> >>>>>> Result: Here's remaining 0 record in the current threshold of process >>>>> level (1000 to 1000.) >>>>>> >>>>>> but here's remaining 1 record in the logger2 level threshold (499 to >>>>> 500). >>>>>> >>>>>> So the record n+3 is allowed to print. >>>>>> >>>>>> Stats change: >>>>>> >>>>>> - Process Stats: 1001 records >>>>>> >>>>>> - logger2 Stats: 500 records >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> NO_...: Subsequent logs will no longer be output as both dedicated >>>>>> >>>>>> rate-limited resources and shared rate-limited resources have been >>>>> exhausted. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> 2. Number of customers requesting this feature? Maintenance as >>> @Piotr >>>>>> >>>>>>> Karwasz <pkarw...@apache.org> , mentioned is going to be a 5 - 10 >>> year >>>>>> >>>>>>> period, if we do not have enough customers requesting this, then >>>>>> >>>>>>> maintenance of this feature + efforts might not be worth it. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> Thanks for the response. Sorry, I was not aware of this rule before. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> I'm not aware of the actual size of the user group with such needs. >>>>>> >>>>>> If necessary, perhaps we could conduct a survey in the user mailing >>> list. >>>>>> >>>>>> This email is merely a discussion. If it is prohibited based on this >>>>>> >>>>>> rule before the discussion even begins, it might not be a bad thing, >>>>>> >>>>>> as it could help everyone avoid unnecessary discussions. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> Best, >>>>>> Yuepeng >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> At 2025-01-27 03:32:28, "Jay Kataria" <jaykataria1...@gmail.com> >>> wrote: >>>>>>> Hi Yuepeng, >>>>>>> >>>>>>> This seems interesting there are a few comments that I have based on >>> the >>>>>>> doc and the feature request: >>>>>>> >>>>>>> 1. Can you give an example of the scenarios where this can be useful. >>>>>>> Adding rate limiters to logs seems like an interesting idea, but just >>>>>>> wondering what is the business motivation. >>>>>>> 2. Number of customers requesting this feature? Maintenance as @Piotr >>>>>>> Karwasz <pkarw...@apache.org> , mentioned is going to be a 5 - 10 >>> year >>>>>>> period, if we do not have enough customers requesting this, then >>>>>>> maintenance of this feature + efforts might not be worth it. >>>>>>> 3. I am interested in what you talked about - dimensions and allow >>>>>>> thresholds to be shared across these dimensions or metrics. Could you >>>>> give >>>>>>> an example of this particularly, I just want to know about the real >>>>> world >>>>>>> applications of this. >>>>>>> >>>>>>> >>>>>>> Regards, >>>>>>> Jay Katariya >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Sun, Jan 26, 2025 at 2:57 AM Yuepeng Pan <panyuep...@apache.org> >>>>> wrote: >>>>>>> >>>>>>>> Hi, community, >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> In some business scenarios, users expect the log rate limit >>>>> thresholds to >>>>>>>> be influenced >>>>>>>> >>>>>>>> by different dimensions and allow thresholds to be shared across >>> these >>>>>>>> dimensions or metrics. >>>>>>>> >>>>>>>> This enables the system to flexibly output as many logs as possible >>>>> within >>>>>>>> the safe constraints of the thresholds. >>>>>>>> >>>>>>>> Therefore, it is meaningful to introduce rate limiters based on >>>>> process >>>>>>>> granularity and logger granularity, >>>>>>>> >>>>>>>> targeting both the number of log entries and the size of the logs. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> So, I'd like to start a discussion about 'Support a cross-rate >>> Filter >>>>>>>> based on process and logger granularity'.[1] >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Looking forward to your attention and comments. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Thank you. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> [1] >>>>>>>> >>>>> >>> https://docs.google.com/document/d/1kVa0V_RrPpT5aa5rfxEaH-QxyXTplQr65xUMZMmDoFA/edit?tab=t.0#heading=h.jfuayzme0ome >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Best, >>>>>>>> >>>>>>>> Yuepeng Pan >>>>>> >>>>> >>>