Thanks Volkan for the quick response.

> Could you share an example of how your
> filter is used in a configuration file, please?
Yes, glad to do it.

The specific examples are as follows.



log4j.properties file.

# config lines placeholders.

.....

# Config for the ProcessLoggerCrossFilter, The window size of limiter is a 
constant with 1 min.

appender.main.filter.yourFilterGroup.type=ProcessLoggerCrossFilter

# 3000 records/min at process level.

appender.main.filter.yourFilterGroup.procCountRateLimit=3000

# 204800 bytes/min at process level.

appender.main.filter.yourFilterGroup.procSizeRateLimit=204800

# 32 records/min per logger(class to print log line) at logger level.

appender.main.filter.yourFilterGroup.loggerCountRateLimit=32

# 10240 bytes/min per logger(class to print log line) at logger level.

appender.main.filter.yourFilterGroup.loggerSizeRateLimit=10240


Thanks~



Best,
Yuepeng.




At 2025-01-28 03:48:12, "Volkan Yazıcı" <vol...@yazi.ci> wrote:
>It is great to hear that you have already done the biggest part of the
>work: implementing such a filter! Could you share an example of how your
>filter is used in a configuration file, please?
>
>On Mon, Jan 27, 2025 at 2:07 PM Yuepeng Pan <panyuep...@apache.org> wrote:
>
>> Thanks Volkan for the codes and comments.
>>
>>
>>
>>
>> > You can either implement this in a Java/Kotlin/Scala/etc. class
>>
>> > <https://logging.apache.org/log4j/2.x/manual/filters.html#extending>
>>
>> > or a Script
>>
>> > Filter <https://logging.apache.org/log4j/2.x/manual/filters.html#Script
>> >.
>>
>> > Would you mind explaining to us why these are not an option for you but
>>
>> > instead this logic must be provided as an official Log4j component,
>> please?
>>
>>
>>
>>
>> The functionality can be easily implemented based on the reserved filter
>> interface.
>>
>> The design of the logging interface is excellent.
>>
>>
>>
>>
>> I have already implemented a filter that can achieve similar
>> functionality.
>>
>> It is primarily used in large distributed systems like FLINK and Spark.
>>
>> These systems have the following characteristics when generating
>> production logs:
>>
>>
>>
>>
>> - There are many classes, which means there are many logger names;
>>
>> - The log rate is usually high;
>>
>> - User logs and framework logs are often mixed together.
>>
>>
>>
>>
>> Please allow me to explain why I would like to contribute this to
>>
>> the official repository. From my limited reading, the reasons are:
>>
>>
>>
>>
>> - It is quite valuable in the aforementioned frameworks and use cases.
>>
>> - Existing filters only have logger-level rate limiting, whereas this
>> filter does not.
>>
>> Please feel free to correct me if I’m wrong.
>>
>>
>>
>>
>> Thank you very much.
>>
>>
>>
>>
>> Best,
>> Yuepeng
>>
>>
>>
>>
>>
>> At 2025-01-27 17:44:28, "Volkan Yazıcı" <vol...@yazi.ci> wrote:
>> >Hello Yuepeng,
>> >
>> >Thanks so much for reaching out to us. Your use case is indeed an
>> >interesting one and it is good to learn such Log4j deployments in the
>> wild.
>> >
>> >Consider the following Log4j filter pseudo code:
>> >
>> >WeakHashMap<Key, RateLimiter> rateLimiterByKey =
>> >activeLoggerContext.getObject("rateLimiters");
>> >Key key = Key.fromDimensions(logEvent.getLogger(), ...);
>> >RateLimiter rateLimiter = rateLimiterByKey.putIfAbsent(key, ignored ->
>> >RateLimiter.ofMaxRate(key.maxRate()));
>> >return rateLimiter.acquire() ? Result.ACCEPT : Result.DENY;
>> >
>> >
>> >You can either implement this in a Java/Kotlin/Scala/etc. class
>> ><https://logging.apache.org/log4j/2.x/manual/filters.html#extending>
>> >or a Script
>> >Filter <https://logging.apache.org/log4j/2.x/manual/filters.html#Script>.
>> >Would you mind explaining to us why these are not an option for you but
>> >instead this logic must be provided as an official Log4j component,
>> please?
>> >
>> >Kind regards.
>> >
>> >On Mon, Jan 27, 2025 at 3:55 AM Yuepeng Pan <panyuep...@apache.org>
>> wrote:
>> >
>> >> Sorry, I’m not sure why the formatting of the email appears to be
>> somewhat
>> >> disorganized. Therefore, I have reorganized part of the disordered
>> content
>> >> and added it to doc[1].
>> >>
>> >> Thank you.
>> >>
>> >> [1]
>> >>
>> https://docs.google.com/document/d/1kVa0V_RrPpT5aa5rfxEaH-QxyXTplQr65xUMZMmDoFA/edit?tab=t.0#heading=h.x6o7d75qh2vl
>> >>
>> >> Best,
>> >> Yuepeng
>> >>
>> >> On 2025/01/27 02:46:28 Yuepeng Pan wrote:
>> >> > Thanks Jay Kataria for the comments.
>> >> >
>> >> >
>> >> >
>> >> >
>> >> > > 1. Can you give an example of the scenarios where this can be
>> useful.
>> >> >
>> >> > > Adding rate limiters to logs seems like an interesting idea, but
>> just
>> >> >
>> >> > > wondering what is the business motivation.
>> >> >
>> >> >
>> >> >
>> >> >
>> >> > > 3. I am interested in what you talked about - dimensions and allow
>> >> >
>> >> > > thresholds to be shared across these dimensions or metrics. Could
>> you
>> >> give
>> >> >
>> >> > > an example of this particularly, I just want to know about the real
>> >> world
>> >> >
>> >> > > applications of this.
>> >> >
>> >> >
>> >> >
>> >> >
>> >> > Please let me have a try on clarifing it.
>> >> >
>> >> >
>> >> >
>> >> >
>> >> > Generally speaking, the logging rate of each logger varies.
>> >> >
>> >> > In some scenarios or under the influence of existing filters,
>> >> >
>> >> > if a particular logger generates logs at an especially high rate,
>> >> >
>> >> > the log output of other loggers might be affected.
>> >> >
>> >> > In short, all loggers compete for the same type of rate-limited
>> >> resources without any proactive intervention logic.
>> >> >
>> >> >
>> >> >
>> >> >
>> >> > For example, suppose there are logger1 and logger2,
>> >> >
>> >> > and the user is interested in the log output of logger2.
>> >> >
>> >> > A filter is configured to limit the log rate to 100 records/min.
>> >> >
>> >> > If logger1 produces logs at a rate of 200 records/min,
>> >> >
>> >> > it is highly likely that logger2 will be unable to output any logs
>> >> >
>> >> > because logger1 has already reached the rate-limiting threshold.
>> >> >
>> >> > The user expects that while ensuring the rate-limiting of logs,
>> >> >
>> >> > the target logger should still be able to output the necessary logs.
>> >> >
>> >> > At the least, no logger should be completely blocked from outputting
>> >> logs due to rate-limiting.
>> >> >
>> >> >
>> >> >
>> >> >
>> >> > The best solution in this case is to set a shared rate-limiting
>> >> condition for each logger.
>> >> >
>> >> > For example, allow each logger to output 100 records/min.
>> >> >
>> >> > This way, every logger is guaranteed a certain log output rate under
>> >> rate-limiting.
>> >> >
>> >> > When the number of loggers is small, or when the log generation rate
>> of
>> >> the process is relatively low,
>> >> >
>> >> > even if each logger has reached the rate-limiting threshold, some
>> output
>> >> can still be allowed.
>> >> >
>> >> > This refers to the shared rate-limited resources or thresholds among
>> all
>> >> loggers.
>> >> >
>> >> > In this rate limiter, this corresponds to a process-level
>> rate-limiting
>> >> threshold.
>> >> >
>> >> >
>> >> >
>> >> >
>> >> > I drafted an example to illustrate how loggers can isolate
>> rate-limited
>> >> resources and compete for shared rate-limited resources.
>> >> >
>> >> >
>> >> >
>> >> >
>> >> > - The filter limiter statistics window is 1min.
>> >> >
>> >> > - Filter configs:
>> >> >
>> >> > - process level: 1000 records/min
>> >> >
>> >> > - logger level: 500 records/min
>> >> >
>> >> > - All loggers in the system: logger1, logger2
>> >> >
>> >> > - Statistics of already generated log records
>> >> >
>> >> > - Process Stats: 998 records
>> >> >
>> >> > - logger1 Stats: 499 records
>> >> >
>> >> > - logger2 Stats: 499 records
>> >> >
>> >> > - The new log records sequences
>> >> >
>> >> > NO_n: record n of logger1
>> >> >
>> >> > Result: Here's remaining 1 record in the current threshold of logger1
>> >> (499 to 500),
>> >> >
>> >> > so the record n is allowed to print.
>> >> >
>> >> > Stats change:
>> >> >
>> >> > - Process Stats: 999 records
>> >> >
>> >> > - logger1 Stats: 500 records
>> >> >
>> >> > NO_n+1. record n+1 of logger1
>> >> >
>> >> > Result: Here's remaining 0 record in the current threshold of logger1
>> >> (500 to 500).
>> >> >
>> >> > but here's remaining 1 record in the process level threshold (999 to
>> >> 1000).
>> >> >
>> >> > So the record n+1 is  allowed to print.
>> >> >
>> >> > Stats change:
>> >> >
>> >> > - Process Stats: 1000 records
>> >> >
>> >> > - logger1 Stats: 501 records
>> >> >
>> >> > NO_n+2. record n+2 of logger1
>> >> >
>> >> > Result: Here are no remaining records in threshold of logger1 level
>> and
>> >> process level.
>> >> >
>> >> > So the record n+2 is not allowed to print.
>> >> >
>> >> > Stats change: N.A
>> >> >
>> >> > NO_n+3. record n+3 of logger2
>> >> >
>> >> > Result: Here's remaining 0 record in the current threshold of process
>> >> level (1000 to 1000.)
>> >> >
>> >> > but here's remaining 1 record in the logger2 level threshold (499 to
>> >> 500).
>> >> >
>> >> > So the record n+3 is allowed to print.
>> >> >
>> >> > Stats change:
>> >> >
>> >> > - Process Stats: 1001 records
>> >> >
>> >> > - logger2 Stats: 500 records
>> >> >
>> >> >
>> >> >
>> >> >
>> >> > NO_...: Subsequent logs will no longer be output as both dedicated
>> >> >
>> >> > rate-limited resources and shared rate-limited resources have been
>> >> exhausted.
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> > > 2. Number of customers requesting this feature? Maintenance as
>> @Piotr
>> >> >
>> >> > > Karwasz <pkarw...@apache.org> , mentioned is going to be a 5 - 10
>> year
>> >> >
>> >> > > period, if we do not have enough customers requesting this, then
>> >> >
>> >> > > maintenance of this feature + efforts might not be worth it.
>> >> >
>> >> >
>> >> >
>> >> >
>> >> > Thanks for the response. Sorry, I was not aware of this rule before.
>> >> >
>> >> >
>> >> >
>> >> >
>> >> > I'm not aware of the actual size of the user group with such needs.
>> >> >
>> >> > If necessary, perhaps we could conduct a survey in the user mailing
>> list.
>> >> >
>> >> > This email is merely a discussion. If it is prohibited based on this
>> >> >
>> >> > rule before the discussion even begins, it might not be a bad thing,
>> >> >
>> >> > as it could help everyone avoid unnecessary discussions.
>> >> >
>> >> >
>> >> >
>> >> >
>> >> > Best,
>> >> > Yuepeng
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> > At 2025-01-27 03:32:28, "Jay Kataria" <jaykataria1...@gmail.com>
>> wrote:
>> >> > >Hi Yuepeng,
>> >> > >
>> >> > >This seems interesting there are a few comments that I have based on
>> the
>> >> > >doc and the feature request:
>> >> > >
>> >> > >1. Can you give an example of the scenarios where this can be useful.
>> >> > >Adding rate limiters to logs seems like an interesting idea, but just
>> >> > >wondering what is the business motivation.
>> >> > >2. Number of customers requesting this feature? Maintenance as @Piotr
>> >> > >Karwasz <pkarw...@apache.org> , mentioned is going to be a 5 - 10
>> year
>> >> > >period, if we do not have enough customers requesting this, then
>> >> > >maintenance of this feature + efforts might not be worth it.
>> >> > >3. I am interested in what you talked about - dimensions and allow
>> >> > >thresholds to be shared across these dimensions or metrics. Could you
>> >> give
>> >> > >an example of this particularly, I just want to know about the real
>> >> world
>> >> > >applications of this.
>> >> > >
>> >> > >
>> >> > >Regards,
>> >> > >Jay Katariya
>> >> > >
>> >> > >
>> >> > >
>> >> > >On Sun, Jan 26, 2025 at 2:57 AM Yuepeng Pan <panyuep...@apache.org>
>> >> wrote:
>> >> > >
>> >> > >> Hi, community,
>> >> > >>
>> >> > >>
>> >> > >>
>> >> > >>
>> >> > >> In some business scenarios, users expect the log rate limit
>> >> thresholds to
>> >> > >> be influenced
>> >> > >>
>> >> > >> by different dimensions and allow thresholds to be shared across
>> these
>> >> > >> dimensions or metrics.
>> >> > >>
>> >> > >> This enables the system to flexibly output as many logs as possible
>> >> within
>> >> > >> the safe constraints of the thresholds.
>> >> > >>
>> >> > >> Therefore, it is meaningful to introduce rate limiters based on
>> >> process
>> >> > >> granularity and logger granularity,
>> >> > >>
>> >> > >> targeting both the number of log entries and the size of the logs.
>> >> > >>
>> >> > >>
>> >> > >>
>> >> > >>
>> >> > >> So, I'd like to start a discussion about 'Support a cross-rate
>> Filter
>> >> > >> based on process and logger granularity'.[1]
>> >> > >>
>> >> > >>
>> >> > >>
>> >> > >>
>> >> > >> Looking forward to your attention and comments.
>> >> > >>
>> >> > >>
>> >> > >>
>> >> > >>
>> >> > >> Thank you.
>> >> > >>
>> >> > >>
>> >> > >>
>> >> > >>
>> >> > >> [1]
>> >> > >>
>> >>
>> https://docs.google.com/document/d/1kVa0V_RrPpT5aa5rfxEaH-QxyXTplQr65xUMZMmDoFA/edit?tab=t.0#heading=h.jfuayzme0ome
>> >> > >>
>> >> > >>
>> >> > >>
>> >> > >>
>> >> > >> Best,
>> >> > >>
>> >> > >> Yuepeng Pan
>> >> >
>> >>
>>

Reply via email to