It is great to hear that you have already done the biggest part of the
work: implementing such a filter! Could you share an example of how your
filter is used in a configuration file, please?

On Mon, Jan 27, 2025 at 2:07 PM Yuepeng Pan <panyuep...@apache.org> wrote:

> Thanks Volkan for the codes and comments.
>
>
>
>
> > You can either implement this in a Java/Kotlin/Scala/etc. class
>
> > <https://logging.apache.org/log4j/2.x/manual/filters.html#extending>
>
> > or a Script
>
> > Filter <https://logging.apache.org/log4j/2.x/manual/filters.html#Script
> >.
>
> > Would you mind explaining to us why these are not an option for you but
>
> > instead this logic must be provided as an official Log4j component,
> please?
>
>
>
>
> The functionality can be easily implemented based on the reserved filter
> interface.
>
> The design of the logging interface is excellent.
>
>
>
>
> I have already implemented a filter that can achieve similar
> functionality.
>
> It is primarily used in large distributed systems like FLINK and Spark.
>
> These systems have the following characteristics when generating
> production logs:
>
>
>
>
> - There are many classes, which means there are many logger names;
>
> - The log rate is usually high;
>
> - User logs and framework logs are often mixed together.
>
>
>
>
> Please allow me to explain why I would like to contribute this to
>
> the official repository. From my limited reading, the reasons are:
>
>
>
>
> - It is quite valuable in the aforementioned frameworks and use cases.
>
> - Existing filters only have logger-level rate limiting, whereas this
> filter does not.
>
> Please feel free to correct me if I’m wrong.
>
>
>
>
> Thank you very much.
>
>
>
>
> Best,
> Yuepeng
>
>
>
>
>
> At 2025-01-27 17:44:28, "Volkan Yazıcı" <vol...@yazi.ci> wrote:
> >Hello Yuepeng,
> >
> >Thanks so much for reaching out to us. Your use case is indeed an
> >interesting one and it is good to learn such Log4j deployments in the
> wild.
> >
> >Consider the following Log4j filter pseudo code:
> >
> >WeakHashMap<Key, RateLimiter> rateLimiterByKey =
> >activeLoggerContext.getObject("rateLimiters");
> >Key key = Key.fromDimensions(logEvent.getLogger(), ...);
> >RateLimiter rateLimiter = rateLimiterByKey.putIfAbsent(key, ignored ->
> >RateLimiter.ofMaxRate(key.maxRate()));
> >return rateLimiter.acquire() ? Result.ACCEPT : Result.DENY;
> >
> >
> >You can either implement this in a Java/Kotlin/Scala/etc. class
> ><https://logging.apache.org/log4j/2.x/manual/filters.html#extending>
> >or a Script
> >Filter <https://logging.apache.org/log4j/2.x/manual/filters.html#Script>.
> >Would you mind explaining to us why these are not an option for you but
> >instead this logic must be provided as an official Log4j component,
> please?
> >
> >Kind regards.
> >
> >On Mon, Jan 27, 2025 at 3:55 AM Yuepeng Pan <panyuep...@apache.org>
> wrote:
> >
> >> Sorry, I’m not sure why the formatting of the email appears to be
> somewhat
> >> disorganized. Therefore, I have reorganized part of the disordered
> content
> >> and added it to doc[1].
> >>
> >> Thank you.
> >>
> >> [1]
> >>
> https://docs.google.com/document/d/1kVa0V_RrPpT5aa5rfxEaH-QxyXTplQr65xUMZMmDoFA/edit?tab=t.0#heading=h.x6o7d75qh2vl
> >>
> >> Best,
> >> Yuepeng
> >>
> >> On 2025/01/27 02:46:28 Yuepeng Pan wrote:
> >> > Thanks Jay Kataria for the comments.
> >> >
> >> >
> >> >
> >> >
> >> > > 1. Can you give an example of the scenarios where this can be
> useful.
> >> >
> >> > > Adding rate limiters to logs seems like an interesting idea, but
> just
> >> >
> >> > > wondering what is the business motivation.
> >> >
> >> >
> >> >
> >> >
> >> > > 3. I am interested in what you talked about - dimensions and allow
> >> >
> >> > > thresholds to be shared across these dimensions or metrics. Could
> you
> >> give
> >> >
> >> > > an example of this particularly, I just want to know about the real
> >> world
> >> >
> >> > > applications of this.
> >> >
> >> >
> >> >
> >> >
> >> > Please let me have a try on clarifing it.
> >> >
> >> >
> >> >
> >> >
> >> > Generally speaking, the logging rate of each logger varies.
> >> >
> >> > In some scenarios or under the influence of existing filters,
> >> >
> >> > if a particular logger generates logs at an especially high rate,
> >> >
> >> > the log output of other loggers might be affected.
> >> >
> >> > In short, all loggers compete for the same type of rate-limited
> >> resources without any proactive intervention logic.
> >> >
> >> >
> >> >
> >> >
> >> > For example, suppose there are logger1 and logger2,
> >> >
> >> > and the user is interested in the log output of logger2.
> >> >
> >> > A filter is configured to limit the log rate to 100 records/min.
> >> >
> >> > If logger1 produces logs at a rate of 200 records/min,
> >> >
> >> > it is highly likely that logger2 will be unable to output any logs
> >> >
> >> > because logger1 has already reached the rate-limiting threshold.
> >> >
> >> > The user expects that while ensuring the rate-limiting of logs,
> >> >
> >> > the target logger should still be able to output the necessary logs.
> >> >
> >> > At the least, no logger should be completely blocked from outputting
> >> logs due to rate-limiting.
> >> >
> >> >
> >> >
> >> >
> >> > The best solution in this case is to set a shared rate-limiting
> >> condition for each logger.
> >> >
> >> > For example, allow each logger to output 100 records/min.
> >> >
> >> > This way, every logger is guaranteed a certain log output rate under
> >> rate-limiting.
> >> >
> >> > When the number of loggers is small, or when the log generation rate
> of
> >> the process is relatively low,
> >> >
> >> > even if each logger has reached the rate-limiting threshold, some
> output
> >> can still be allowed.
> >> >
> >> > This refers to the shared rate-limited resources or thresholds among
> all
> >> loggers.
> >> >
> >> > In this rate limiter, this corresponds to a process-level
> rate-limiting
> >> threshold.
> >> >
> >> >
> >> >
> >> >
> >> > I drafted an example to illustrate how loggers can isolate
> rate-limited
> >> resources and compete for shared rate-limited resources.
> >> >
> >> >
> >> >
> >> >
> >> > - The filter limiter statistics window is 1min.
> >> >
> >> > - Filter configs:
> >> >
> >> > - process level: 1000 records/min
> >> >
> >> > - logger level: 500 records/min
> >> >
> >> > - All loggers in the system: logger1, logger2
> >> >
> >> > - Statistics of already generated log records
> >> >
> >> > - Process Stats: 998 records
> >> >
> >> > - logger1 Stats: 499 records
> >> >
> >> > - logger2 Stats: 499 records
> >> >
> >> > - The new log records sequences
> >> >
> >> > NO_n: record n of logger1
> >> >
> >> > Result: Here's remaining 1 record in the current threshold of logger1
> >> (499 to 500),
> >> >
> >> > so the record n is allowed to print.
> >> >
> >> > Stats change:
> >> >
> >> > - Process Stats: 999 records
> >> >
> >> > - logger1 Stats: 500 records
> >> >
> >> > NO_n+1. record n+1 of logger1
> >> >
> >> > Result: Here's remaining 0 record in the current threshold of logger1
> >> (500 to 500).
> >> >
> >> > but here's remaining 1 record in the process level threshold (999 to
> >> 1000).
> >> >
> >> > So the record n+1 is  allowed to print.
> >> >
> >> > Stats change:
> >> >
> >> > - Process Stats: 1000 records
> >> >
> >> > - logger1 Stats: 501 records
> >> >
> >> > NO_n+2. record n+2 of logger1
> >> >
> >> > Result: Here are no remaining records in threshold of logger1 level
> and
> >> process level.
> >> >
> >> > So the record n+2 is not allowed to print.
> >> >
> >> > Stats change: N.A
> >> >
> >> > NO_n+3. record n+3 of logger2
> >> >
> >> > Result: Here's remaining 0 record in the current threshold of process
> >> level (1000 to 1000.)
> >> >
> >> > but here's remaining 1 record in the logger2 level threshold (499 to
> >> 500).
> >> >
> >> > So the record n+3 is allowed to print.
> >> >
> >> > Stats change:
> >> >
> >> > - Process Stats: 1001 records
> >> >
> >> > - logger2 Stats: 500 records
> >> >
> >> >
> >> >
> >> >
> >> > NO_...: Subsequent logs will no longer be output as both dedicated
> >> >
> >> > rate-limited resources and shared rate-limited resources have been
> >> exhausted.
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> > > 2. Number of customers requesting this feature? Maintenance as
> @Piotr
> >> >
> >> > > Karwasz <pkarw...@apache.org> , mentioned is going to be a 5 - 10
> year
> >> >
> >> > > period, if we do not have enough customers requesting this, then
> >> >
> >> > > maintenance of this feature + efforts might not be worth it.
> >> >
> >> >
> >> >
> >> >
> >> > Thanks for the response. Sorry, I was not aware of this rule before.
> >> >
> >> >
> >> >
> >> >
> >> > I'm not aware of the actual size of the user group with such needs.
> >> >
> >> > If necessary, perhaps we could conduct a survey in the user mailing
> list.
> >> >
> >> > This email is merely a discussion. If it is prohibited based on this
> >> >
> >> > rule before the discussion even begins, it might not be a bad thing,
> >> >
> >> > as it could help everyone avoid unnecessary discussions.
> >> >
> >> >
> >> >
> >> >
> >> > Best,
> >> > Yuepeng
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> > At 2025-01-27 03:32:28, "Jay Kataria" <jaykataria1...@gmail.com>
> wrote:
> >> > >Hi Yuepeng,
> >> > >
> >> > >This seems interesting there are a few comments that I have based on
> the
> >> > >doc and the feature request:
> >> > >
> >> > >1. Can you give an example of the scenarios where this can be useful.
> >> > >Adding rate limiters to logs seems like an interesting idea, but just
> >> > >wondering what is the business motivation.
> >> > >2. Number of customers requesting this feature? Maintenance as @Piotr
> >> > >Karwasz <pkarw...@apache.org> , mentioned is going to be a 5 - 10
> year
> >> > >period, if we do not have enough customers requesting this, then
> >> > >maintenance of this feature + efforts might not be worth it.
> >> > >3. I am interested in what you talked about - dimensions and allow
> >> > >thresholds to be shared across these dimensions or metrics. Could you
> >> give
> >> > >an example of this particularly, I just want to know about the real
> >> world
> >> > >applications of this.
> >> > >
> >> > >
> >> > >Regards,
> >> > >Jay Katariya
> >> > >
> >> > >
> >> > >
> >> > >On Sun, Jan 26, 2025 at 2:57 AM Yuepeng Pan <panyuep...@apache.org>
> >> wrote:
> >> > >
> >> > >> Hi, community,
> >> > >>
> >> > >>
> >> > >>
> >> > >>
> >> > >> In some business scenarios, users expect the log rate limit
> >> thresholds to
> >> > >> be influenced
> >> > >>
> >> > >> by different dimensions and allow thresholds to be shared across
> these
> >> > >> dimensions or metrics.
> >> > >>
> >> > >> This enables the system to flexibly output as many logs as possible
> >> within
> >> > >> the safe constraints of the thresholds.
> >> > >>
> >> > >> Therefore, it is meaningful to introduce rate limiters based on
> >> process
> >> > >> granularity and logger granularity,
> >> > >>
> >> > >> targeting both the number of log entries and the size of the logs.
> >> > >>
> >> > >>
> >> > >>
> >> > >>
> >> > >> So, I'd like to start a discussion about 'Support a cross-rate
> Filter
> >> > >> based on process and logger granularity'.[1]
> >> > >>
> >> > >>
> >> > >>
> >> > >>
> >> > >> Looking forward to your attention and comments.
> >> > >>
> >> > >>
> >> > >>
> >> > >>
> >> > >> Thank you.
> >> > >>
> >> > >>
> >> > >>
> >> > >>
> >> > >> [1]
> >> > >>
> >>
> https://docs.google.com/document/d/1kVa0V_RrPpT5aa5rfxEaH-QxyXTplQr65xUMZMmDoFA/edit?tab=t.0#heading=h.jfuayzme0ome
> >> > >>
> >> > >>
> >> > >>
> >> > >>
> >> > >> Best,
> >> > >>
> >> > >> Yuepeng Pan
> >> >
> >>
>

Reply via email to