Thanks so much for the elaborate example Yuepeng.

We receive hardly any feedback for the burst filter, which makes me think
it is not much used. Your custom filter is a further specialization of the
burst filter. I was curious if one can provide the rate limiter dimensions
in the configuration of your custom filter (this would make your filter a
superset of the burst filter), but I see that they are hardcoded. All in
all, I think your filter certainly holds merit, though it addresses the
concern of a very small fraction of our users.

If I am not mistaken, you participate as a committer in the Apache
StreamPark project involving both Flink and Spark. Would it be possible to
publish your filter in a `apache/streampark-log4j` GitHub project? I think
this will not only make your filter easily accessible as a single Maven
dependency, but also give you more freedom on how you maintain it. If you
happen to create such a project, I would be more than happy to review it
and refer to it in the Burst Filter manual.

*Nit:* You should consider staying away from the Java properties format due
to reasons
<https://logging.apache.org/log4j/2.x/manual/configuration.html#java-properties-features>
.

On Tue, Jan 28, 2025 at 1:59 AM Yuepeng Pan <panyuep...@apache.org> wrote:

> Thanks Volkan for the quick response.
>
> > Could you share an example of how your
> > filter is used in a configuration file, please?
> Yes, glad to do it.
>
> The specific examples are as follows.
>
>
>
> log4j.properties file.
>
> # config lines placeholders.
>
> .....
>
> # Config for the ProcessLoggerCrossFilter, The window size of limiter is a
> constant with 1 min.
>
> appender.main.filter.yourFilterGroup.type=ProcessLoggerCrossFilter
>
> # 3000 records/min at process level.
>
> appender.main.filter.yourFilterGroup.procCountRateLimit=3000
>
> # 204800 bytes/min at process level.
>
> appender.main.filter.yourFilterGroup.procSizeRateLimit=204800
>
> # 32 records/min per logger(class to print log line) at logger level.
>
> appender.main.filter.yourFilterGroup.loggerCountRateLimit=32
>
> # 10240 bytes/min per logger(class to print log line) at logger level.
>
> appender.main.filter.yourFilterGroup.loggerSizeRateLimit=10240
>
>
> Thanks~
>
>
>
> Best,
> Yuepeng.
>
>
>
>
> At 2025-01-28 03:48:12, "Volkan Yazıcı" <vol...@yazi.ci> wrote:
> >It is great to hear that you have already done the biggest part of the
> >work: implementing such a filter! Could you share an example of how your
> >filter is used in a configuration file, please?
> >
> >On Mon, Jan 27, 2025 at 2:07 PM Yuepeng Pan <panyuep...@apache.org>
> wrote:
> >
> >> Thanks Volkan for the codes and comments.
> >>
> >>
> >>
> >>
> >> > You can either implement this in a Java/Kotlin/Scala/etc. class
> >>
> >> > <https://logging.apache.org/log4j/2.x/manual/filters.html#extending>
> >>
> >> > or a Script
> >>
> >> > Filter <
> https://logging.apache.org/log4j/2.x/manual/filters.html#Script
> >> >.
> >>
> >> > Would you mind explaining to us why these are not an option for you
> but
> >>
> >> > instead this logic must be provided as an official Log4j component,
> >> please?
> >>
> >>
> >>
> >>
> >> The functionality can be easily implemented based on the reserved filter
> >> interface.
> >>
> >> The design of the logging interface is excellent.
> >>
> >>
> >>
> >>
> >> I have already implemented a filter that can achieve similar
> >> functionality.
> >>
> >> It is primarily used in large distributed systems like FLINK and Spark.
> >>
> >> These systems have the following characteristics when generating
> >> production logs:
> >>
> >>
> >>
> >>
> >> - There are many classes, which means there are many logger names;
> >>
> >> - The log rate is usually high;
> >>
> >> - User logs and framework logs are often mixed together.
> >>
> >>
> >>
> >>
> >> Please allow me to explain why I would like to contribute this to
> >>
> >> the official repository. From my limited reading, the reasons are:
> >>
> >>
> >>
> >>
> >> - It is quite valuable in the aforementioned frameworks and use cases.
> >>
> >> - Existing filters only have logger-level rate limiting, whereas this
> >> filter does not.
> >>
> >> Please feel free to correct me if I’m wrong.
> >>
> >>
> >>
> >>
> >> Thank you very much.
> >>
> >>
> >>
> >>
> >> Best,
> >> Yuepeng
> >>
> >>
> >>
> >>
> >>
> >> At 2025-01-27 17:44:28, "Volkan Yazıcı" <vol...@yazi.ci> wrote:
> >> >Hello Yuepeng,
> >> >
> >> >Thanks so much for reaching out to us. Your use case is indeed an
> >> >interesting one and it is good to learn such Log4j deployments in the
> >> wild.
> >> >
> >> >Consider the following Log4j filter pseudo code:
> >> >
> >> >WeakHashMap<Key, RateLimiter> rateLimiterByKey =
> >> >activeLoggerContext.getObject("rateLimiters");
> >> >Key key = Key.fromDimensions(logEvent.getLogger(), ...);
> >> >RateLimiter rateLimiter = rateLimiterByKey.putIfAbsent(key, ignored ->
> >> >RateLimiter.ofMaxRate(key.maxRate()));
> >> >return rateLimiter.acquire() ? Result.ACCEPT : Result.DENY;
> >> >
> >> >
> >> >You can either implement this in a Java/Kotlin/Scala/etc. class
> >> ><https://logging.apache.org/log4j/2.x/manual/filters.html#extending>
> >> >or a Script
> >> >Filter <
> https://logging.apache.org/log4j/2.x/manual/filters.html#Script>.
> >> >Would you mind explaining to us why these are not an option for you but
> >> >instead this logic must be provided as an official Log4j component,
> >> please?
> >> >
> >> >Kind regards.
> >> >
> >> >On Mon, Jan 27, 2025 at 3:55 AM Yuepeng Pan <panyuep...@apache.org>
> >> wrote:
> >> >
> >> >> Sorry, I’m not sure why the formatting of the email appears to be
> >> somewhat
> >> >> disorganized. Therefore, I have reorganized part of the disordered
> >> content
> >> >> and added it to doc[1].
> >> >>
> >> >> Thank you.
> >> >>
> >> >> [1]
> >> >>
> >>
> https://docs.google.com/document/d/1kVa0V_RrPpT5aa5rfxEaH-QxyXTplQr65xUMZMmDoFA/edit?tab=t.0#heading=h.x6o7d75qh2vl
> >> >>
> >> >> Best,
> >> >> Yuepeng
> >> >>
> >> >> On 2025/01/27 02:46:28 Yuepeng Pan wrote:
> >> >> > Thanks Jay Kataria for the comments.
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> >> >> > > 1. Can you give an example of the scenarios where this can be
> >> useful.
> >> >> >
> >> >> > > Adding rate limiters to logs seems like an interesting idea, but
> >> just
> >> >> >
> >> >> > > wondering what is the business motivation.
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> >> >> > > 3. I am interested in what you talked about - dimensions and
> allow
> >> >> >
> >> >> > > thresholds to be shared across these dimensions or metrics. Could
> >> you
> >> >> give
> >> >> >
> >> >> > > an example of this particularly, I just want to know about the
> real
> >> >> world
> >> >> >
> >> >> > > applications of this.
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> >> >> > Please let me have a try on clarifing it.
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> >> >> > Generally speaking, the logging rate of each logger varies.
> >> >> >
> >> >> > In some scenarios or under the influence of existing filters,
> >> >> >
> >> >> > if a particular logger generates logs at an especially high rate,
> >> >> >
> >> >> > the log output of other loggers might be affected.
> >> >> >
> >> >> > In short, all loggers compete for the same type of rate-limited
> >> >> resources without any proactive intervention logic.
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> >> >> > For example, suppose there are logger1 and logger2,
> >> >> >
> >> >> > and the user is interested in the log output of logger2.
> >> >> >
> >> >> > A filter is configured to limit the log rate to 100 records/min.
> >> >> >
> >> >> > If logger1 produces logs at a rate of 200 records/min,
> >> >> >
> >> >> > it is highly likely that logger2 will be unable to output any logs
> >> >> >
> >> >> > because logger1 has already reached the rate-limiting threshold.
> >> >> >
> >> >> > The user expects that while ensuring the rate-limiting of logs,
> >> >> >
> >> >> > the target logger should still be able to output the necessary
> logs.
> >> >> >
> >> >> > At the least, no logger should be completely blocked from
> outputting
> >> >> logs due to rate-limiting.
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> >> >> > The best solution in this case is to set a shared rate-limiting
> >> >> condition for each logger.
> >> >> >
> >> >> > For example, allow each logger to output 100 records/min.
> >> >> >
> >> >> > This way, every logger is guaranteed a certain log output rate
> under
> >> >> rate-limiting.
> >> >> >
> >> >> > When the number of loggers is small, or when the log generation
> rate
> >> of
> >> >> the process is relatively low,
> >> >> >
> >> >> > even if each logger has reached the rate-limiting threshold, some
> >> output
> >> >> can still be allowed.
> >> >> >
> >> >> > This refers to the shared rate-limited resources or thresholds
> among
> >> all
> >> >> loggers.
> >> >> >
> >> >> > In this rate limiter, this corresponds to a process-level
> >> rate-limiting
> >> >> threshold.
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> >> >> > I drafted an example to illustrate how loggers can isolate
> >> rate-limited
> >> >> resources and compete for shared rate-limited resources.
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> >> >> > - The filter limiter statistics window is 1min.
> >> >> >
> >> >> > - Filter configs:
> >> >> >
> >> >> > - process level: 1000 records/min
> >> >> >
> >> >> > - logger level: 500 records/min
> >> >> >
> >> >> > - All loggers in the system: logger1, logger2
> >> >> >
> >> >> > - Statistics of already generated log records
> >> >> >
> >> >> > - Process Stats: 998 records
> >> >> >
> >> >> > - logger1 Stats: 499 records
> >> >> >
> >> >> > - logger2 Stats: 499 records
> >> >> >
> >> >> > - The new log records sequences
> >> >> >
> >> >> > NO_n: record n of logger1
> >> >> >
> >> >> > Result: Here's remaining 1 record in the current threshold of
> logger1
> >> >> (499 to 500),
> >> >> >
> >> >> > so the record n is allowed to print.
> >> >> >
> >> >> > Stats change:
> >> >> >
> >> >> > - Process Stats: 999 records
> >> >> >
> >> >> > - logger1 Stats: 500 records
> >> >> >
> >> >> > NO_n+1. record n+1 of logger1
> >> >> >
> >> >> > Result: Here's remaining 0 record in the current threshold of
> logger1
> >> >> (500 to 500).
> >> >> >
> >> >> > but here's remaining 1 record in the process level threshold (999
> to
> >> >> 1000).
> >> >> >
> >> >> > So the record n+1 is  allowed to print.
> >> >> >
> >> >> > Stats change:
> >> >> >
> >> >> > - Process Stats: 1000 records
> >> >> >
> >> >> > - logger1 Stats: 501 records
> >> >> >
> >> >> > NO_n+2. record n+2 of logger1
> >> >> >
> >> >> > Result: Here are no remaining records in threshold of logger1 level
> >> and
> >> >> process level.
> >> >> >
> >> >> > So the record n+2 is not allowed to print.
> >> >> >
> >> >> > Stats change: N.A
> >> >> >
> >> >> > NO_n+3. record n+3 of logger2
> >> >> >
> >> >> > Result: Here's remaining 0 record in the current threshold of
> process
> >> >> level (1000 to 1000.)
> >> >> >
> >> >> > but here's remaining 1 record in the logger2 level threshold (499
> to
> >> >> 500).
> >> >> >
> >> >> > So the record n+3 is allowed to print.
> >> >> >
> >> >> > Stats change:
> >> >> >
> >> >> > - Process Stats: 1001 records
> >> >> >
> >> >> > - logger2 Stats: 500 records
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> >> >> > NO_...: Subsequent logs will no longer be output as both dedicated
> >> >> >
> >> >> > rate-limited resources and shared rate-limited resources have been
> >> >> exhausted.
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> >> >> > > 2. Number of customers requesting this feature? Maintenance as
> >> @Piotr
> >> >> >
> >> >> > > Karwasz <pkarw...@apache.org> , mentioned is going to be a 5 -
> 10
> >> year
> >> >> >
> >> >> > > period, if we do not have enough customers requesting this, then
> >> >> >
> >> >> > > maintenance of this feature + efforts might not be worth it.
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> >> >> > Thanks for the response. Sorry, I was not aware of this rule
> before.
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> >> >> > I'm not aware of the actual size of the user group with such needs.
> >> >> >
> >> >> > If necessary, perhaps we could conduct a survey in the user mailing
> >> list.
> >> >> >
> >> >> > This email is merely a discussion. If it is prohibited based on
> this
> >> >> >
> >> >> > rule before the discussion even begins, it might not be a bad
> thing,
> >> >> >
> >> >> > as it could help everyone avoid unnecessary discussions.
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> >> >> > Best,
> >> >> > Yuepeng
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> >> >> > At 2025-01-27 03:32:28, "Jay Kataria" <jaykataria1...@gmail.com>
> >> wrote:
> >> >> > >Hi Yuepeng,
> >> >> > >
> >> >> > >This seems interesting there are a few comments that I have based
> on
> >> the
> >> >> > >doc and the feature request:
> >> >> > >
> >> >> > >1. Can you give an example of the scenarios where this can be
> useful.
> >> >> > >Adding rate limiters to logs seems like an interesting idea, but
> just
> >> >> > >wondering what is the business motivation.
> >> >> > >2. Number of customers requesting this feature? Maintenance as
> @Piotr
> >> >> > >Karwasz <pkarw...@apache.org> , mentioned is going to be a 5 - 10
> >> year
> >> >> > >period, if we do not have enough customers requesting this, then
> >> >> > >maintenance of this feature + efforts might not be worth it.
> >> >> > >3. I am interested in what you talked about - dimensions and allow
> >> >> > >thresholds to be shared across these dimensions or metrics. Could
> you
> >> >> give
> >> >> > >an example of this particularly, I just want to know about the
> real
> >> >> world
> >> >> > >applications of this.
> >> >> > >
> >> >> > >
> >> >> > >Regards,
> >> >> > >Jay Katariya
> >> >> > >
> >> >> > >
> >> >> > >
> >> >> > >On Sun, Jan 26, 2025 at 2:57 AM Yuepeng Pan <
> panyuep...@apache.org>
> >> >> wrote:
> >> >> > >
> >> >> > >> Hi, community,
> >> >> > >>
> >> >> > >>
> >> >> > >>
> >> >> > >>
> >> >> > >> In some business scenarios, users expect the log rate limit
> >> >> thresholds to
> >> >> > >> be influenced
> >> >> > >>
> >> >> > >> by different dimensions and allow thresholds to be shared across
> >> these
> >> >> > >> dimensions or metrics.
> >> >> > >>
> >> >> > >> This enables the system to flexibly output as many logs as
> possible
> >> >> within
> >> >> > >> the safe constraints of the thresholds.
> >> >> > >>
> >> >> > >> Therefore, it is meaningful to introduce rate limiters based on
> >> >> process
> >> >> > >> granularity and logger granularity,
> >> >> > >>
> >> >> > >> targeting both the number of log entries and the size of the
> logs.
> >> >> > >>
> >> >> > >>
> >> >> > >>
> >> >> > >>
> >> >> > >> So, I'd like to start a discussion about 'Support a cross-rate
> >> Filter
> >> >> > >> based on process and logger granularity'.[1]
> >> >> > >>
> >> >> > >>
> >> >> > >>
> >> >> > >>
> >> >> > >> Looking forward to your attention and comments.
> >> >> > >>
> >> >> > >>
> >> >> > >>
> >> >> > >>
> >> >> > >> Thank you.
> >> >> > >>
> >> >> > >>
> >> >> > >>
> >> >> > >>
> >> >> > >> [1]
> >> >> > >>
> >> >>
> >>
> https://docs.google.com/document/d/1kVa0V_RrPpT5aa5rfxEaH-QxyXTplQr65xUMZMmDoFA/edit?tab=t.0#heading=h.jfuayzme0ome
> >> >> > >>
> >> >> > >>
> >> >> > >>
> >> >> > >>
> >> >> > >> Best,
> >> >> > >>
> >> >> > >> Yuepeng Pan
> >> >> >
> >> >>
> >>
>

Reply via email to