Thanks Volkan for the quick response. > Could you share an example of how your > filter is used in a configuration file, please? Yes, glad to do it.
The specific examples are as follows. log4j.properties file. # config lines placeholders. ..... # Config for the ProcessLoggerCrossFilter, The window size of limiter is a constant with 1 min. appender.main.filter.yourFilterGroup.type=ProcessLoggerCrossFilter # 3000 records/min at process level. appender.main.filter.yourFilterGroup.procCountRateLimit=3000 # 204800 bytes/min at process level. appender.main.filter.yourFilterGroup.procSizeRateLimit=204800 # 32 records/min per logger(class to print log line) at logger level. appender.main.filter.yourFilterGroup.loggerCountRateLimit=32 # 10240 bytes/min per logger(class to print log line) at logger level. appender.main.filter.yourFilterGroup.loggerSizeRateLimit=10240 Thanks~ Best, Yuepeng. At 2025-01-28 03:48:12, "Volkan Yazıcı" <vol...@yazi.ci> wrote: >It is great to hear that you have already done the biggest part of the >work: implementing such a filter! Could you share an example of how your >filter is used in a configuration file, please? > >On Mon, Jan 27, 2025 at 2:07 PM Yuepeng Pan <panyuep...@apache.org> wrote: > >> Thanks Volkan for the codes and comments. >> >> >> >> >> > You can either implement this in a Java/Kotlin/Scala/etc. class >> >> > <https://logging.apache.org/log4j/2.x/manual/filters.html#extending> >> >> > or a Script >> >> > Filter <https://logging.apache.org/log4j/2.x/manual/filters.html#Script >> >. >> >> > Would you mind explaining to us why these are not an option for you but >> >> > instead this logic must be provided as an official Log4j component, >> please? >> >> >> >> >> The functionality can be easily implemented based on the reserved filter >> interface. >> >> The design of the logging interface is excellent. >> >> >> >> >> I have already implemented a filter that can achieve similar >> functionality. >> >> It is primarily used in large distributed systems like FLINK and Spark. >> >> These systems have the following characteristics when generating >> production logs: >> >> >> >> >> - There are many classes, which means there are many logger names; >> >> - The log rate is usually high; >> >> - User logs and framework logs are often mixed together. >> >> >> >> >> Please allow me to explain why I would like to contribute this to >> >> the official repository. From my limited reading, the reasons are: >> >> >> >> >> - It is quite valuable in the aforementioned frameworks and use cases. >> >> - Existing filters only have logger-level rate limiting, whereas this >> filter does not. >> >> Please feel free to correct me if I’m wrong. >> >> >> >> >> Thank you very much. >> >> >> >> >> Best, >> Yuepeng >> >> >> >> >> >> At 2025-01-27 17:44:28, "Volkan Yazıcı" <vol...@yazi.ci> wrote: >> >Hello Yuepeng, >> > >> >Thanks so much for reaching out to us. Your use case is indeed an >> >interesting one and it is good to learn such Log4j deployments in the >> wild. >> > >> >Consider the following Log4j filter pseudo code: >> > >> >WeakHashMap<Key, RateLimiter> rateLimiterByKey = >> >activeLoggerContext.getObject("rateLimiters"); >> >Key key = Key.fromDimensions(logEvent.getLogger(), ...); >> >RateLimiter rateLimiter = rateLimiterByKey.putIfAbsent(key, ignored -> >> >RateLimiter.ofMaxRate(key.maxRate())); >> >return rateLimiter.acquire() ? Result.ACCEPT : Result.DENY; >> > >> > >> >You can either implement this in a Java/Kotlin/Scala/etc. class >> ><https://logging.apache.org/log4j/2.x/manual/filters.html#extending> >> >or a Script >> >Filter <https://logging.apache.org/log4j/2.x/manual/filters.html#Script>. >> >Would you mind explaining to us why these are not an option for you but >> >instead this logic must be provided as an official Log4j component, >> please? >> > >> >Kind regards. >> > >> >On Mon, Jan 27, 2025 at 3:55 AM Yuepeng Pan <panyuep...@apache.org> >> wrote: >> > >> >> Sorry, I’m not sure why the formatting of the email appears to be >> somewhat >> >> disorganized. Therefore, I have reorganized part of the disordered >> content >> >> and added it to doc[1]. >> >> >> >> Thank you. >> >> >> >> [1] >> >> >> https://docs.google.com/document/d/1kVa0V_RrPpT5aa5rfxEaH-QxyXTplQr65xUMZMmDoFA/edit?tab=t.0#heading=h.x6o7d75qh2vl >> >> >> >> Best, >> >> Yuepeng >> >> >> >> On 2025/01/27 02:46:28 Yuepeng Pan wrote: >> >> > Thanks Jay Kataria for the comments. >> >> > >> >> > >> >> > >> >> > >> >> > > 1. Can you give an example of the scenarios where this can be >> useful. >> >> > >> >> > > Adding rate limiters to logs seems like an interesting idea, but >> just >> >> > >> >> > > wondering what is the business motivation. >> >> > >> >> > >> >> > >> >> > >> >> > > 3. I am interested in what you talked about - dimensions and allow >> >> > >> >> > > thresholds to be shared across these dimensions or metrics. Could >> you >> >> give >> >> > >> >> > > an example of this particularly, I just want to know about the real >> >> world >> >> > >> >> > > applications of this. >> >> > >> >> > >> >> > >> >> > >> >> > Please let me have a try on clarifing it. >> >> > >> >> > >> >> > >> >> > >> >> > Generally speaking, the logging rate of each logger varies. >> >> > >> >> > In some scenarios or under the influence of existing filters, >> >> > >> >> > if a particular logger generates logs at an especially high rate, >> >> > >> >> > the log output of other loggers might be affected. >> >> > >> >> > In short, all loggers compete for the same type of rate-limited >> >> resources without any proactive intervention logic. >> >> > >> >> > >> >> > >> >> > >> >> > For example, suppose there are logger1 and logger2, >> >> > >> >> > and the user is interested in the log output of logger2. >> >> > >> >> > A filter is configured to limit the log rate to 100 records/min. >> >> > >> >> > If logger1 produces logs at a rate of 200 records/min, >> >> > >> >> > it is highly likely that logger2 will be unable to output any logs >> >> > >> >> > because logger1 has already reached the rate-limiting threshold. >> >> > >> >> > The user expects that while ensuring the rate-limiting of logs, >> >> > >> >> > the target logger should still be able to output the necessary logs. >> >> > >> >> > At the least, no logger should be completely blocked from outputting >> >> logs due to rate-limiting. >> >> > >> >> > >> >> > >> >> > >> >> > The best solution in this case is to set a shared rate-limiting >> >> condition for each logger. >> >> > >> >> > For example, allow each logger to output 100 records/min. >> >> > >> >> > This way, every logger is guaranteed a certain log output rate under >> >> rate-limiting. >> >> > >> >> > When the number of loggers is small, or when the log generation rate >> of >> >> the process is relatively low, >> >> > >> >> > even if each logger has reached the rate-limiting threshold, some >> output >> >> can still be allowed. >> >> > >> >> > This refers to the shared rate-limited resources or thresholds among >> all >> >> loggers. >> >> > >> >> > In this rate limiter, this corresponds to a process-level >> rate-limiting >> >> threshold. >> >> > >> >> > >> >> > >> >> > >> >> > I drafted an example to illustrate how loggers can isolate >> rate-limited >> >> resources and compete for shared rate-limited resources. >> >> > >> >> > >> >> > >> >> > >> >> > - The filter limiter statistics window is 1min. >> >> > >> >> > - Filter configs: >> >> > >> >> > - process level: 1000 records/min >> >> > >> >> > - logger level: 500 records/min >> >> > >> >> > - All loggers in the system: logger1, logger2 >> >> > >> >> > - Statistics of already generated log records >> >> > >> >> > - Process Stats: 998 records >> >> > >> >> > - logger1 Stats: 499 records >> >> > >> >> > - logger2 Stats: 499 records >> >> > >> >> > - The new log records sequences >> >> > >> >> > NO_n: record n of logger1 >> >> > >> >> > Result: Here's remaining 1 record in the current threshold of logger1 >> >> (499 to 500), >> >> > >> >> > so the record n is allowed to print. >> >> > >> >> > Stats change: >> >> > >> >> > - Process Stats: 999 records >> >> > >> >> > - logger1 Stats: 500 records >> >> > >> >> > NO_n+1. record n+1 of logger1 >> >> > >> >> > Result: Here's remaining 0 record in the current threshold of logger1 >> >> (500 to 500). >> >> > >> >> > but here's remaining 1 record in the process level threshold (999 to >> >> 1000). >> >> > >> >> > So the record n+1 is allowed to print. >> >> > >> >> > Stats change: >> >> > >> >> > - Process Stats: 1000 records >> >> > >> >> > - logger1 Stats: 501 records >> >> > >> >> > NO_n+2. record n+2 of logger1 >> >> > >> >> > Result: Here are no remaining records in threshold of logger1 level >> and >> >> process level. >> >> > >> >> > So the record n+2 is not allowed to print. >> >> > >> >> > Stats change: N.A >> >> > >> >> > NO_n+3. record n+3 of logger2 >> >> > >> >> > Result: Here's remaining 0 record in the current threshold of process >> >> level (1000 to 1000.) >> >> > >> >> > but here's remaining 1 record in the logger2 level threshold (499 to >> >> 500). >> >> > >> >> > So the record n+3 is allowed to print. >> >> > >> >> > Stats change: >> >> > >> >> > - Process Stats: 1001 records >> >> > >> >> > - logger2 Stats: 500 records >> >> > >> >> > >> >> > >> >> > >> >> > NO_...: Subsequent logs will no longer be output as both dedicated >> >> > >> >> > rate-limited resources and shared rate-limited resources have been >> >> exhausted. >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > > 2. Number of customers requesting this feature? Maintenance as >> @Piotr >> >> > >> >> > > Karwasz <pkarw...@apache.org> , mentioned is going to be a 5 - 10 >> year >> >> > >> >> > > period, if we do not have enough customers requesting this, then >> >> > >> >> > > maintenance of this feature + efforts might not be worth it. >> >> > >> >> > >> >> > >> >> > >> >> > Thanks for the response. Sorry, I was not aware of this rule before. >> >> > >> >> > >> >> > >> >> > >> >> > I'm not aware of the actual size of the user group with such needs. >> >> > >> >> > If necessary, perhaps we could conduct a survey in the user mailing >> list. >> >> > >> >> > This email is merely a discussion. If it is prohibited based on this >> >> > >> >> > rule before the discussion even begins, it might not be a bad thing, >> >> > >> >> > as it could help everyone avoid unnecessary discussions. >> >> > >> >> > >> >> > >> >> > >> >> > Best, >> >> > Yuepeng >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > At 2025-01-27 03:32:28, "Jay Kataria" <jaykataria1...@gmail.com> >> wrote: >> >> > >Hi Yuepeng, >> >> > > >> >> > >This seems interesting there are a few comments that I have based on >> the >> >> > >doc and the feature request: >> >> > > >> >> > >1. Can you give an example of the scenarios where this can be useful. >> >> > >Adding rate limiters to logs seems like an interesting idea, but just >> >> > >wondering what is the business motivation. >> >> > >2. Number of customers requesting this feature? Maintenance as @Piotr >> >> > >Karwasz <pkarw...@apache.org> , mentioned is going to be a 5 - 10 >> year >> >> > >period, if we do not have enough customers requesting this, then >> >> > >maintenance of this feature + efforts might not be worth it. >> >> > >3. I am interested in what you talked about - dimensions and allow >> >> > >thresholds to be shared across these dimensions or metrics. Could you >> >> give >> >> > >an example of this particularly, I just want to know about the real >> >> world >> >> > >applications of this. >> >> > > >> >> > > >> >> > >Regards, >> >> > >Jay Katariya >> >> > > >> >> > > >> >> > > >> >> > >On Sun, Jan 26, 2025 at 2:57 AM Yuepeng Pan <panyuep...@apache.org> >> >> wrote: >> >> > > >> >> > >> Hi, community, >> >> > >> >> >> > >> >> >> > >> >> >> > >> >> >> > >> In some business scenarios, users expect the log rate limit >> >> thresholds to >> >> > >> be influenced >> >> > >> >> >> > >> by different dimensions and allow thresholds to be shared across >> these >> >> > >> dimensions or metrics. >> >> > >> >> >> > >> This enables the system to flexibly output as many logs as possible >> >> within >> >> > >> the safe constraints of the thresholds. >> >> > >> >> >> > >> Therefore, it is meaningful to introduce rate limiters based on >> >> process >> >> > >> granularity and logger granularity, >> >> > >> >> >> > >> targeting both the number of log entries and the size of the logs. >> >> > >> >> >> > >> >> >> > >> >> >> > >> >> >> > >> So, I'd like to start a discussion about 'Support a cross-rate >> Filter >> >> > >> based on process and logger granularity'.[1] >> >> > >> >> >> > >> >> >> > >> >> >> > >> >> >> > >> Looking forward to your attention and comments. >> >> > >> >> >> > >> >> >> > >> >> >> > >> >> >> > >> Thank you. >> >> > >> >> >> > >> >> >> > >> >> >> > >> >> >> > >> [1] >> >> > >> >> >> >> https://docs.google.com/document/d/1kVa0V_RrPpT5aa5rfxEaH-QxyXTplQr65xUMZMmDoFA/edit?tab=t.0#heading=h.jfuayzme0ome >> >> > >> >> >> > >> >> >> > >> >> >> > >> >> >> > >> Best, >> >> > >> >> >> > >> Yuepeng Pan >> >> > >> >> >>