Why does the BurstFilter not address your concern?

Ralph

> On Jan 27, 2025, at 5:59 PM, Yuepeng Pan <panyuep...@apache.org> wrote:
> 
> Thanks Volkan for the quick response.
> 
>> Could you share an example of how your
>> filter is used in a configuration file, please?
> Yes, glad to do it.
> 
> The specific examples are as follows.
> 
> 
> 
> log4j.properties file.
> 
> # config lines placeholders.
> 
> .....
> 
> # Config for the ProcessLoggerCrossFilter, The window size of limiter is a 
> constant with 1 min.
> 
> appender.main.filter.yourFilterGroup.type=ProcessLoggerCrossFilter
> 
> # 3000 records/min at process level.
> 
> appender.main.filter.yourFilterGroup.procCountRateLimit=3000
> 
> # 204800 bytes/min at process level.
> 
> appender.main.filter.yourFilterGroup.procSizeRateLimit=204800
> 
> # 32 records/min per logger(class to print log line) at logger level.
> 
> appender.main.filter.yourFilterGroup.loggerCountRateLimit=32
> 
> # 10240 bytes/min per logger(class to print log line) at logger level.
> 
> appender.main.filter.yourFilterGroup.loggerSizeRateLimit=10240
> 
> 
> Thanks~
> 
> 
> 
> Best,
> Yuepeng.
> 
> 
> 
> 
> At 2025-01-28 03:48:12, "Volkan Yazıcı" <vol...@yazi.ci> wrote:
>> It is great to hear that you have already done the biggest part of the
>> work: implementing such a filter! Could you share an example of how your
>> filter is used in a configuration file, please?
>> 
>> On Mon, Jan 27, 2025 at 2:07 PM Yuepeng Pan <panyuep...@apache.org> wrote:
>> 
>>> Thanks Volkan for the codes and comments.
>>> 
>>> 
>>> 
>>> 
>>>> You can either implement this in a Java/Kotlin/Scala/etc. class
>>> 
>>>> <https://logging.apache.org/log4j/2.x/manual/filters.html#extending>
>>> 
>>>> or a Script
>>> 
>>>> Filter <https://logging.apache.org/log4j/2.x/manual/filters.html#Script
>>>> .
>>> 
>>>> Would you mind explaining to us why these are not an option for you but
>>> 
>>>> instead this logic must be provided as an official Log4j component,
>>> please?
>>> 
>>> 
>>> 
>>> 
>>> The functionality can be easily implemented based on the reserved filter
>>> interface.
>>> 
>>> The design of the logging interface is excellent.
>>> 
>>> 
>>> 
>>> 
>>> I have already implemented a filter that can achieve similar
>>> functionality.
>>> 
>>> It is primarily used in large distributed systems like FLINK and Spark.
>>> 
>>> These systems have the following characteristics when generating
>>> production logs:
>>> 
>>> 
>>> 
>>> 
>>> - There are many classes, which means there are many logger names;
>>> 
>>> - The log rate is usually high;
>>> 
>>> - User logs and framework logs are often mixed together.
>>> 
>>> 
>>> 
>>> 
>>> Please allow me to explain why I would like to contribute this to
>>> 
>>> the official repository. From my limited reading, the reasons are:
>>> 
>>> 
>>> 
>>> 
>>> - It is quite valuable in the aforementioned frameworks and use cases.
>>> 
>>> - Existing filters only have logger-level rate limiting, whereas this
>>> filter does not.
>>> 
>>> Please feel free to correct me if I’m wrong.
>>> 
>>> 
>>> 
>>> 
>>> Thank you very much.
>>> 
>>> 
>>> 
>>> 
>>> Best,
>>> Yuepeng
>>> 
>>> 
>>> 
>>> 
>>> 
>>> At 2025-01-27 17:44:28, "Volkan Yazıcı" <vol...@yazi.ci> wrote:
>>>> Hello Yuepeng,
>>>> 
>>>> Thanks so much for reaching out to us. Your use case is indeed an
>>>> interesting one and it is good to learn such Log4j deployments in the
>>> wild.
>>>> 
>>>> Consider the following Log4j filter pseudo code:
>>>> 
>>>> WeakHashMap<Key, RateLimiter> rateLimiterByKey =
>>>> activeLoggerContext.getObject("rateLimiters");
>>>> Key key = Key.fromDimensions(logEvent.getLogger(), ...);
>>>> RateLimiter rateLimiter = rateLimiterByKey.putIfAbsent(key, ignored ->
>>>> RateLimiter.ofMaxRate(key.maxRate()));
>>>> return rateLimiter.acquire() ? Result.ACCEPT : Result.DENY;
>>>> 
>>>> 
>>>> You can either implement this in a Java/Kotlin/Scala/etc. class
>>>> <https://logging.apache.org/log4j/2.x/manual/filters.html#extending>
>>>> or a Script
>>>> Filter <https://logging.apache.org/log4j/2.x/manual/filters.html#Script>.
>>>> Would you mind explaining to us why these are not an option for you but
>>>> instead this logic must be provided as an official Log4j component,
>>> please?
>>>> 
>>>> Kind regards.
>>>> 
>>>> On Mon, Jan 27, 2025 at 3:55 AM Yuepeng Pan <panyuep...@apache.org>
>>> wrote:
>>>> 
>>>>> Sorry, I’m not sure why the formatting of the email appears to be
>>> somewhat
>>>>> disorganized. Therefore, I have reorganized part of the disordered
>>> content
>>>>> and added it to doc[1].
>>>>> 
>>>>> Thank you.
>>>>> 
>>>>> [1]
>>>>> 
>>> https://docs.google.com/document/d/1kVa0V_RrPpT5aa5rfxEaH-QxyXTplQr65xUMZMmDoFA/edit?tab=t.0#heading=h.x6o7d75qh2vl
>>>>> 
>>>>> Best,
>>>>> Yuepeng
>>>>> 
>>>>> On 2025/01/27 02:46:28 Yuepeng Pan wrote:
>>>>>> Thanks Jay Kataria for the comments.
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>> 1. Can you give an example of the scenarios where this can be
>>> useful.
>>>>>> 
>>>>>>> Adding rate limiters to logs seems like an interesting idea, but
>>> just
>>>>>> 
>>>>>>> wondering what is the business motivation.
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>> 3. I am interested in what you talked about - dimensions and allow
>>>>>> 
>>>>>>> thresholds to be shared across these dimensions or metrics. Could
>>> you
>>>>> give
>>>>>> 
>>>>>>> an example of this particularly, I just want to know about the real
>>>>> world
>>>>>> 
>>>>>>> applications of this.
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> Please let me have a try on clarifing it.
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> Generally speaking, the logging rate of each logger varies.
>>>>>> 
>>>>>> In some scenarios or under the influence of existing filters,
>>>>>> 
>>>>>> if a particular logger generates logs at an especially high rate,
>>>>>> 
>>>>>> the log output of other loggers might be affected.
>>>>>> 
>>>>>> In short, all loggers compete for the same type of rate-limited
>>>>> resources without any proactive intervention logic.
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> For example, suppose there are logger1 and logger2,
>>>>>> 
>>>>>> and the user is interested in the log output of logger2.
>>>>>> 
>>>>>> A filter is configured to limit the log rate to 100 records/min.
>>>>>> 
>>>>>> If logger1 produces logs at a rate of 200 records/min,
>>>>>> 
>>>>>> it is highly likely that logger2 will be unable to output any logs
>>>>>> 
>>>>>> because logger1 has already reached the rate-limiting threshold.
>>>>>> 
>>>>>> The user expects that while ensuring the rate-limiting of logs,
>>>>>> 
>>>>>> the target logger should still be able to output the necessary logs.
>>>>>> 
>>>>>> At the least, no logger should be completely blocked from outputting
>>>>> logs due to rate-limiting.
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> The best solution in this case is to set a shared rate-limiting
>>>>> condition for each logger.
>>>>>> 
>>>>>> For example, allow each logger to output 100 records/min.
>>>>>> 
>>>>>> This way, every logger is guaranteed a certain log output rate under
>>>>> rate-limiting.
>>>>>> 
>>>>>> When the number of loggers is small, or when the log generation rate
>>> of
>>>>> the process is relatively low,
>>>>>> 
>>>>>> even if each logger has reached the rate-limiting threshold, some
>>> output
>>>>> can still be allowed.
>>>>>> 
>>>>>> This refers to the shared rate-limited resources or thresholds among
>>> all
>>>>> loggers.
>>>>>> 
>>>>>> In this rate limiter, this corresponds to a process-level
>>> rate-limiting
>>>>> threshold.
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> I drafted an example to illustrate how loggers can isolate
>>> rate-limited
>>>>> resources and compete for shared rate-limited resources.
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> - The filter limiter statistics window is 1min.
>>>>>> 
>>>>>> - Filter configs:
>>>>>> 
>>>>>> - process level: 1000 records/min
>>>>>> 
>>>>>> - logger level: 500 records/min
>>>>>> 
>>>>>> - All loggers in the system: logger1, logger2
>>>>>> 
>>>>>> - Statistics of already generated log records
>>>>>> 
>>>>>> - Process Stats: 998 records
>>>>>> 
>>>>>> - logger1 Stats: 499 records
>>>>>> 
>>>>>> - logger2 Stats: 499 records
>>>>>> 
>>>>>> - The new log records sequences
>>>>>> 
>>>>>> NO_n: record n of logger1
>>>>>> 
>>>>>> Result: Here's remaining 1 record in the current threshold of logger1
>>>>> (499 to 500),
>>>>>> 
>>>>>> so the record n is allowed to print.
>>>>>> 
>>>>>> Stats change:
>>>>>> 
>>>>>> - Process Stats: 999 records
>>>>>> 
>>>>>> - logger1 Stats: 500 records
>>>>>> 
>>>>>> NO_n+1. record n+1 of logger1
>>>>>> 
>>>>>> Result: Here's remaining 0 record in the current threshold of logger1
>>>>> (500 to 500).
>>>>>> 
>>>>>> but here's remaining 1 record in the process level threshold (999 to
>>>>> 1000).
>>>>>> 
>>>>>> So the record n+1 is  allowed to print.
>>>>>> 
>>>>>> Stats change:
>>>>>> 
>>>>>> - Process Stats: 1000 records
>>>>>> 
>>>>>> - logger1 Stats: 501 records
>>>>>> 
>>>>>> NO_n+2. record n+2 of logger1
>>>>>> 
>>>>>> Result: Here are no remaining records in threshold of logger1 level
>>> and
>>>>> process level.
>>>>>> 
>>>>>> So the record n+2 is not allowed to print.
>>>>>> 
>>>>>> Stats change: N.A
>>>>>> 
>>>>>> NO_n+3. record n+3 of logger2
>>>>>> 
>>>>>> Result: Here's remaining 0 record in the current threshold of process
>>>>> level (1000 to 1000.)
>>>>>> 
>>>>>> but here's remaining 1 record in the logger2 level threshold (499 to
>>>>> 500).
>>>>>> 
>>>>>> So the record n+3 is allowed to print.
>>>>>> 
>>>>>> Stats change:
>>>>>> 
>>>>>> - Process Stats: 1001 records
>>>>>> 
>>>>>> - logger2 Stats: 500 records
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> NO_...: Subsequent logs will no longer be output as both dedicated
>>>>>> 
>>>>>> rate-limited resources and shared rate-limited resources have been
>>>>> exhausted.
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>> 2. Number of customers requesting this feature? Maintenance as
>>> @Piotr
>>>>>> 
>>>>>>> Karwasz <pkarw...@apache.org> , mentioned is going to be a 5 - 10
>>> year
>>>>>> 
>>>>>>> period, if we do not have enough customers requesting this, then
>>>>>> 
>>>>>>> maintenance of this feature + efforts might not be worth it.
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> Thanks for the response. Sorry, I was not aware of this rule before.
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> I'm not aware of the actual size of the user group with such needs.
>>>>>> 
>>>>>> If necessary, perhaps we could conduct a survey in the user mailing
>>> list.
>>>>>> 
>>>>>> This email is merely a discussion. If it is prohibited based on this
>>>>>> 
>>>>>> rule before the discussion even begins, it might not be a bad thing,
>>>>>> 
>>>>>> as it could help everyone avoid unnecessary discussions.
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> Best,
>>>>>> Yuepeng
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> At 2025-01-27 03:32:28, "Jay Kataria" <jaykataria1...@gmail.com>
>>> wrote:
>>>>>>> Hi Yuepeng,
>>>>>>> 
>>>>>>> This seems interesting there are a few comments that I have based on
>>> the
>>>>>>> doc and the feature request:
>>>>>>> 
>>>>>>> 1. Can you give an example of the scenarios where this can be useful.
>>>>>>> Adding rate limiters to logs seems like an interesting idea, but just
>>>>>>> wondering what is the business motivation.
>>>>>>> 2. Number of customers requesting this feature? Maintenance as @Piotr
>>>>>>> Karwasz <pkarw...@apache.org> , mentioned is going to be a 5 - 10
>>> year
>>>>>>> period, if we do not have enough customers requesting this, then
>>>>>>> maintenance of this feature + efforts might not be worth it.
>>>>>>> 3. I am interested in what you talked about - dimensions and allow
>>>>>>> thresholds to be shared across these dimensions or metrics. Could you
>>>>> give
>>>>>>> an example of this particularly, I just want to know about the real
>>>>> world
>>>>>>> applications of this.
>>>>>>> 
>>>>>>> 
>>>>>>> Regards,
>>>>>>> Jay Katariya
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> On Sun, Jan 26, 2025 at 2:57 AM Yuepeng Pan <panyuep...@apache.org>
>>>>> wrote:
>>>>>>> 
>>>>>>>> Hi, community,
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> In some business scenarios, users expect the log rate limit
>>>>> thresholds to
>>>>>>>> be influenced
>>>>>>>> 
>>>>>>>> by different dimensions and allow thresholds to be shared across
>>> these
>>>>>>>> dimensions or metrics.
>>>>>>>> 
>>>>>>>> This enables the system to flexibly output as many logs as possible
>>>>> within
>>>>>>>> the safe constraints of the thresholds.
>>>>>>>> 
>>>>>>>> Therefore, it is meaningful to introduce rate limiters based on
>>>>> process
>>>>>>>> granularity and logger granularity,
>>>>>>>> 
>>>>>>>> targeting both the number of log entries and the size of the logs.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> So, I'd like to start a discussion about 'Support a cross-rate
>>> Filter
>>>>>>>> based on process and logger granularity'.[1]
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Looking forward to your attention and comments.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Thank you.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> [1]
>>>>>>>> 
>>>>> 
>>> https://docs.google.com/document/d/1kVa0V_RrPpT5aa5rfxEaH-QxyXTplQr65xUMZMmDoFA/edit?tab=t.0#heading=h.jfuayzme0ome
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Best,
>>>>>>>> 
>>>>>>>> Yuepeng Pan
>>>>>> 
>>>>> 
>>> 

Reply via email to