I get it. Thanks Volkan for the clarification and suggestion. And thanks all of you related in the discussion.
The discussion was cancelled. Best. Yuepeng ---- Replied Message ---- | From | Volkan Yazıcı<vol...@yazi.ci> | | Date | 01/28/2025 17:19 | | To | dev<dev@logging.apache.org> | | Subject | Re: Re: Re: Re:Re: [DISCUSS] Support a cross-rate Filter based on process and logger granularity. | Thanks so much for the elaborate example Yuepeng. We receive hardly any feedback for the burst filter, which makes me think it is not much used. Your custom filter is a further specialization of the burst filter. I was curious if one can provide the rate limiter dimensions in the configuration of your custom filter (this would make your filter a superset of the burst filter), but I see that they are hardcoded. All in all, I think your filter certainly holds merit, though it addresses the concern of a very small fraction of our users. If I am not mistaken, you participate as a committer in the Apache StreamPark project involving both Flink and Spark. Would it be possible to publish your filter in a `apache/streampark-log4j` GitHub project? I think this will not only make your filter easily accessible as a single Maven dependency, but also give you more freedom on how you maintain it. If you happen to create such a project, I would be more than happy to review it and refer to it in the Burst Filter manual. *Nit:* You should consider staying away from the Java properties format due to reasons <https://logging.apache.org/log4j/2.x/manual/configuration.html#java-properties-features> . On Tue, Jan 28, 2025 at 1:59 AM Yuepeng Pan <panyuep...@apache.org> wrote: > Thanks Volkan for the quick response. > > > Could you share an example of how your > > filter is used in a configuration file, please? > Yes, glad to do it. > > The specific examples are as follows. > > > > log4j.properties file. > > # config lines placeholders. > > ..... > > # Config for the ProcessLoggerCrossFilter, The window size of limiter is a > constant with 1 min. > > appender.main.filter.yourFilterGroup.type=ProcessLoggerCrossFilter > > # 3000 records/min at process level. > > appender.main.filter.yourFilterGroup.procCountRateLimit=3000 > > # 204800 bytes/min at process level. > > appender.main.filter.yourFilterGroup.procSizeRateLimit=204800 > > # 32 records/min per logger(class to print log line) at logger level. > > appender.main.filter.yourFilterGroup.loggerCountRateLimit=32 > > # 10240 bytes/min per logger(class to print log line) at logger level. > > appender.main.filter.yourFilterGroup.loggerSizeRateLimit=10240 > > > Thanks~ > > > > Best, > Yuepeng. > > > > > At 2025-01-28 03:48:12, "Volkan Yazıcı" <vol...@yazi.ci> wrote: > >It is great to hear that you have already done the biggest part of the > >work: implementing such a filter! Could you share an example of how your > >filter is used in a configuration file, please? > > > >On Mon, Jan 27, 2025 at 2:07 PM Yuepeng Pan <panyuep...@apache.org> > wrote: > > > >> Thanks Volkan for the codes and comments. > >> > >> > >> > >> > >> > You can either implement this in a Java/Kotlin/Scala/etc. class > >> > >> > <https://logging.apache.org/log4j/2.x/manual/filters.html#extending> > >> > >> > or a Script > >> > >> > Filter < > https://logging.apache.org/log4j/2.x/manual/filters.html#Script > >> >. > >> > >> > Would you mind explaining to us why these are not an option for you > but > >> > >> > instead this logic must be provided as an official Log4j component, > >> please? > >> > >> > >> > >> > >> The functionality can be easily implemented based on the reserved filter > >> interface. > >> > >> The design of the logging interface is excellent. > >> > >> > >> > >> > >> I have already implemented a filter that can achieve similar > >> functionality. > >> > >> It is primarily used in large distributed systems like FLINK and Spark. > >> > >> These systems have the following characteristics when generating > >> production logs: > >> > >> > >> > >> > >> - There are many classes, which means there are many logger names; > >> > >> - The log rate is usually high; > >> > >> - User logs and framework logs are often mixed together. > >> > >> > >> > >> > >> Please allow me to explain why I would like to contribute this to > >> > >> the official repository. From my limited reading, the reasons are: > >> > >> > >> > >> > >> - It is quite valuable in the aforementioned frameworks and use cases. > >> > >> - Existing filters only have logger-level rate limiting, whereas this > >> filter does not. > >> > >> Please feel free to correct me if I’m wrong. > >> > >> > >> > >> > >> Thank you very much. > >> > >> > >> > >> > >> Best, > >> Yuepeng > >> > >> > >> > >> > >> > >> At 2025-01-27 17:44:28, "Volkan Yazıcı" <vol...@yazi.ci> wrote: > >> >Hello Yuepeng, > >> > > >> >Thanks so much for reaching out to us. Your use case is indeed an > >> >interesting one and it is good to learn such Log4j deployments in the > >> wild. > >> > > >> >Consider the following Log4j filter pseudo code: > >> > > >> >WeakHashMap<Key, RateLimiter> rateLimiterByKey = > >> >activeLoggerContext.getObject("rateLimiters"); > >> >Key key = Key.fromDimensions(logEvent.getLogger(), ...); > >> >RateLimiter rateLimiter = rateLimiterByKey.putIfAbsent(key, ignored -> > >> >RateLimiter.ofMaxRate(key.maxRate())); > >> >return rateLimiter.acquire() ? Result.ACCEPT : Result.DENY; > >> > > >> > > >> >You can either implement this in a Java/Kotlin/Scala/etc. class > >> ><https://logging.apache.org/log4j/2.x/manual/filters.html#extending> > >> >or a Script > >> >Filter < > https://logging.apache.org/log4j/2.x/manual/filters.html#Script>. > >> >Would you mind explaining to us why these are not an option for you but > >> >instead this logic must be provided as an official Log4j component, > >> please? > >> > > >> >Kind regards. > >> > > >> >On Mon, Jan 27, 2025 at 3:55 AM Yuepeng Pan <panyuep...@apache.org> > >> wrote: > >> > > >> >> Sorry, I’m not sure why the formatting of the email appears to be > >> somewhat > >> >> disorganized. Therefore, I have reorganized part of the disordered > >> content > >> >> and added it to doc[1]. > >> >> > >> >> Thank you. > >> >> > >> >> [1] > >> >> > >> > https://docs.google.com/document/d/1kVa0V_RrPpT5aa5rfxEaH-QxyXTplQr65xUMZMmDoFA/edit?tab=t.0#heading=h.x6o7d75qh2vl > >> >> > >> >> Best, > >> >> Yuepeng > >> >> > >> >> On 2025/01/27 02:46:28 Yuepeng Pan wrote: > >> >> > Thanks Jay Kataria for the comments. > >> >> > > >> >> > > >> >> > > >> >> > > >> >> > > 1. Can you give an example of the scenarios where this can be > >> useful. > >> >> > > >> >> > > Adding rate limiters to logs seems like an interesting idea, but > >> just > >> >> > > >> >> > > wondering what is the business motivation. > >> >> > > >> >> > > >> >> > > >> >> > > >> >> > > 3. I am interested in what you talked about - dimensions and > allow > >> >> > > >> >> > > thresholds to be shared across these dimensions or metrics. Could > >> you > >> >> give > >> >> > > >> >> > > an example of this particularly, I just want to know about the > real > >> >> world > >> >> > > >> >> > > applications of this. > >> >> > > >> >> > > >> >> > > >> >> > > >> >> > Please let me have a try on clarifing it. > >> >> > > >> >> > > >> >> > > >> >> > > >> >> > Generally speaking, the logging rate of each logger varies. > >> >> > > >> >> > In some scenarios or under the influence of existing filters, > >> >> > > >> >> > if a particular logger generates logs at an especially high rate, > >> >> > > >> >> > the log output of other loggers might be affected. > >> >> > > >> >> > In short, all loggers compete for the same type of rate-limited > >> >> resources without any proactive intervention logic. > >> >> > > >> >> > > >> >> > > >> >> > > >> >> > For example, suppose there are logger1 and logger2, > >> >> > > >> >> > and the user is interested in the log output of logger2. > >> >> > > >> >> > A filter is configured to limit the log rate to 100 records/min. > >> >> > > >> >> > If logger1 produces logs at a rate of 200 records/min, > >> >> > > >> >> > it is highly likely that logger2 will be unable to output any logs > >> >> > > >> >> > because logger1 has already reached the rate-limiting threshold. > >> >> > > >> >> > The user expects that while ensuring the rate-limiting of logs, > >> >> > > >> >> > the target logger should still be able to output the necessary > logs. > >> >> > > >> >> > At the least, no logger should be completely blocked from > outputting > >> >> logs due to rate-limiting. > >> >> > > >> >> > > >> >> > > >> >> > > >> >> > The best solution in this case is to set a shared rate-limiting > >> >> condition for each logger. > >> >> > > >> >> > For example, allow each logger to output 100 records/min. > >> >> > > >> >> > This way, every logger is guaranteed a certain log output rate > under > >> >> rate-limiting. > >> >> > > >> >> > When the number of loggers is small, or when the log generation > rate > >> of > >> >> the process is relatively low, > >> >> > > >> >> > even if each logger has reached the rate-limiting threshold, some > >> output > >> >> can still be allowed. > >> >> > > >> >> > This refers to the shared rate-limited resources or thresholds > among > >> all > >> >> loggers. > >> >> > > >> >> > In this rate limiter, this corresponds to a process-level > >> rate-limiting > >> >> threshold. > >> >> > > >> >> > > >> >> > > >> >> > > >> >> > I drafted an example to illustrate how loggers can isolate > >> rate-limited > >> >> resources and compete for shared rate-limited resources. > >> >> > > >> >> > > >> >> > > >> >> > > >> >> > - The filter limiter statistics window is 1min. > >> >> > > >> >> > - Filter configs: > >> >> > > >> >> > - process level: 1000 records/min > >> >> > > >> >> > - logger level: 500 records/min > >> >> > > >> >> > - All loggers in the system: logger1, logger2 > >> >> > > >> >> > - Statistics of already generated log records > >> >> > > >> >> > - Process Stats: 998 records > >> >> > > >> >> > - logger1 Stats: 499 records > >> >> > > >> >> > - logger2 Stats: 499 records > >> >> > > >> >> > - The new log records sequences > >> >> > > >> >> > NO_n: record n of logger1 > >> >> > > >> >> > Result: Here's remaining 1 record in the current threshold of > logger1 > >> >> (499 to 500), > >> >> > > >> >> > so the record n is allowed to print. > >> >> > > >> >> > Stats change: > >> >> > > >> >> > - Process Stats: 999 records > >> >> > > >> >> > - logger1 Stats: 500 records > >> >> > > >> >> > NO_n+1. record n+1 of logger1 > >> >> > > >> >> > Result: Here's remaining 0 record in the current threshold of > logger1 > >> >> (500 to 500). > >> >> > > >> >> > but here's remaining 1 record in the process level threshold (999 > to > >> >> 1000). > >> >> > > >> >> > So the record n+1 is allowed to print. > >> >> > > >> >> > Stats change: > >> >> > > >> >> > - Process Stats: 1000 records > >> >> > > >> >> > - logger1 Stats: 501 records > >> >> > > >> >> > NO_n+2. record n+2 of logger1 > >> >> > > >> >> > Result: Here are no remaining records in threshold of logger1 level > >> and > >> >> process level. > >> >> > > >> >> > So the record n+2 is not allowed to print. > >> >> > > >> >> > Stats change: N.A > >> >> > > >> >> > NO_n+3. record n+3 of logger2 > >> >> > > >> >> > Result: Here's remaining 0 record in the current threshold of > process > >> >> level (1000 to 1000.) > >> >> > > >> >> > but here's remaining 1 record in the logger2 level threshold (499 > to > >> >> 500). > >> >> > > >> >> > So the record n+3 is allowed to print. > >> >> > > >> >> > Stats change: > >> >> > > >> >> > - Process Stats: 1001 records > >> >> > > >> >> > - logger2 Stats: 500 records > >> >> > > >> >> > > >> >> > > >> >> > > >> >> > NO_...: Subsequent logs will no longer be output as both dedicated > >> >> > > >> >> > rate-limited resources and shared rate-limited resources have been > >> >> exhausted. > >> >> > > >> >> > > >> >> > > >> >> > > >> >> > > >> >> > > >> >> > > >> >> > > 2. Number of customers requesting this feature? Maintenance as > >> @Piotr > >> >> > > >> >> > > Karwasz <pkarw...@apache.org> , mentioned is going to be a 5 - > 10 > >> year > >> >> > > >> >> > > period, if we do not have enough customers requesting this, then > >> >> > > >> >> > > maintenance of this feature + efforts might not be worth it. > >> >> > > >> >> > > >> >> > > >> >> > > >> >> > Thanks for the response. Sorry, I was not aware of this rule > before. > >> >> > > >> >> > > >> >> > > >> >> > > >> >> > I'm not aware of the actual size of the user group with such needs. > >> >> > > >> >> > If necessary, perhaps we could conduct a survey in the user mailing > >> list. > >> >> > > >> >> > This email is merely a discussion. If it is prohibited based on > this > >> >> > > >> >> > rule before the discussion even begins, it might not be a bad > thing, > >> >> > > >> >> > as it could help everyone avoid unnecessary discussions. > >> >> > > >> >> > > >> >> > > >> >> > > >> >> > Best, > >> >> > Yuepeng > >> >> > > >> >> > > >> >> > > >> >> > > >> >> > > >> >> > > >> >> > > >> >> > > >> >> > > >> >> > > >> >> > > >> >> > At 2025-01-27 03:32:28, "Jay Kataria" <jaykataria1...@gmail.com> > >> wrote: > >> >> > >Hi Yuepeng, > >> >> > > > >> >> > >This seems interesting there are a few comments that I have based > on > >> the > >> >> > >doc and the feature request: > >> >> > > > >> >> > >1. Can you give an example of the scenarios where this can be > useful. > >> >> > >Adding rate limiters to logs seems like an interesting idea, but > just > >> >> > >wondering what is the business motivation. > >> >> > >2. Number of customers requesting this feature? Maintenance as > @Piotr > >> >> > >Karwasz <pkarw...@apache.org> , mentioned is going to be a 5 - 10 > >> year > >> >> > >period, if we do not have enough customers requesting this, then > >> >> > >maintenance of this feature + efforts might not be worth it. > >> >> > >3. I am interested in what you talked about - dimensions and allow > >> >> > >thresholds to be shared across these dimensions or metrics. Could > you > >> >> give > >> >> > >an example of this particularly, I just want to know about the > real > >> >> world > >> >> > >applications of this. > >> >> > > > >> >> > > > >> >> > >Regards, > >> >> > >Jay Katariya > >> >> > > > >> >> > > > >> >> > > > >> >> > >On Sun, Jan 26, 2025 at 2:57 AM Yuepeng Pan < > panyuep...@apache.org> > >> >> wrote: > >> >> > > > >> >> > >> Hi, community, > >> >> > >> > >> >> > >> > >> >> > >> > >> >> > >> > >> >> > >> In some business scenarios, users expect the log rate limit > >> >> thresholds to > >> >> > >> be influenced > >> >> > >> > >> >> > >> by different dimensions and allow thresholds to be shared across > >> these > >> >> > >> dimensions or metrics. > >> >> > >> > >> >> > >> This enables the system to flexibly output as many logs as > possible > >> >> within > >> >> > >> the safe constraints of the thresholds. > >> >> > >> > >> >> > >> Therefore, it is meaningful to introduce rate limiters based on > >> >> process > >> >> > >> granularity and logger granularity, > >> >> > >> > >> >> > >> targeting both the number of log entries and the size of the > logs. > >> >> > >> > >> >> > >> > >> >> > >> > >> >> > >> > >> >> > >> So, I'd like to start a discussion about 'Support a cross-rate > >> Filter > >> >> > >> based on process and logger granularity'.[1] > >> >> > >> > >> >> > >> > >> >> > >> > >> >> > >> > >> >> > >> Looking forward to your attention and comments. > >> >> > >> > >> >> > >> > >> >> > >> > >> >> > >> > >> >> > >> Thank you. > >> >> > >> > >> >> > >> > >> >> > >> > >> >> > >> > >> >> > >> [1] > >> >> > >> > >> >> > >> > https://docs.google.com/document/d/1kVa0V_RrPpT5aa5rfxEaH-QxyXTplQr65xUMZMmDoFA/edit?tab=t.0#heading=h.jfuayzme0ome > >> >> > >> > >> >> > >> > >> >> > >> > >> >> > >> > >> >> > >> Best, > >> >> > >> > >> >> > >> Yuepeng Pan > >> >> > > >> >> > >> >