[ 
https://issues.apache.org/jira/browse/SAMZA-2577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rayman updated SAMZA-2577:
--------------------------
    Description: 
Problem: 
 In both StreamAppender for log4j1 and log4j2 a blocking queue is used to 
coordinate between the append()-ing threads and a single thread send()-ing to 
Kafka.
 This is a bounded, blocking, lock-synchronized queue.
 To avoid deadlock scenarios (see SAMZA-1537), the append()-ing threads have a 
timeout of 2 seconds, after which the log message is discarded and the queue is 
drained. 
 This means in case of message bursts, threads calling append() may block for 
upto 2 seconds, and may continually be stuck in this pattern, leading to 
processing stalls and lowered throughput. 

*Solutions for Log4j2* 
 Solution 1. Enable async logger in log4j2, since they are supported and 
provided in log4j2.[https://logging.apache.org/log4j/2.x/manual/async.html].
 In using this capability, the blocking-queue in StreamAppender is not required 
because the logger itself will be asynchronous, and so append() threads can 
directly call systemProducer.send(). 
 However, if async loggers are not used then this queue based mechanism, to 
give the append()-ing threads an "async" illusion, is required.

Solution 2. Continue using the blocking bounded lock-based queue, but make the 
queue size and timeout configurable. Users can then tune this to account for 
message bursts.

Solution 3. Move to use a lock-less queue, e.g., ConcurrentLinkedQueue 
(unbounded) or 
 implement a bounded lock-less queue, or use open-source implementations.
 Append()-ing threads will no longer need to block or timeout. However the 
caller may busy-wait or need a fixed-rate or fixed-sleep-time to avoid busy 
waits, since a lock-less queue is non blocking. 
 It uses CAS operations. 
*For log4j2, we will adopt Solution 1.*

*Solutions for Log4j1*
 Solution 1. Deprecate – log4j1 is not supported. 
 Solution 2. Similar to Solution 2 above.
 Solution 3. Similar to Solution 3 above.
*For log4j1, we will adopt Solution 1 – won't fix.*

  was:
Problem: 
In both StreamAppender for log4j1 and log4j2 a blocking queue is used to 
coordinate between the append()-ing threads and a single thread send()-ing to 
Kafka.
This is a bounded, blocking, lock-synchronized queue.
To avoid deadlock scenarios (see SAMZA-1537), the append()-ing threads have a 
timeout of 2 seconds, after which the log message is discarded and the queue is 
drained. 
This means in case of message bursts, threads calling append() may block for 
upto 2 seconds, and may continually be stuck in this pattern, leading to 
processing stalls and lowered throughput. 

*Solutions for Log4j2* 
Solution 1. Enable async logger in log4j2, since they are supported and 
provided in log4j2.[https://logging.apache.org/log4j/2.x/manual/async.html].
In using this capability, the blocking-queue in StreamAppender is not required 
because the logger itself will be asynchronous, and so append() threads can 
directly call systemProducer.send(). 
However, if async loggers are not used then this queue based mechanism, to give 
the append()-ing threads an "async" illusion, is required.

Solution 2. Continue using the blocking bounded lock-based queue, but make the 
queue size and timeout configurable. Users can then tune this to account for 
message bursts.

Solution 3. Move to use a lock-less queue, e.g., ConcurrentLinkedQueue 
(unbounded) or 
implement a bounded lock-less queue, or use open-source implementations.
Append()-ing threads will no longer need to block or timeout. However the 
caller may busy-wait or need a fixed-rate or fixed-sleep-time to avoid busy 
waits, since a lock-less queue is non blocking. 
It uses CAS operations. 

*Solutions for Log4j1*
Solution 1. Deprecate – log4j1 is not supported. 
Solution 2. Similar to Solution 2 above.
Solution 3. Similar to Solution 3 above.


> Threads appending to StreamAppender block/deadlock in high tput scenarios, 
> leading to processing stalls
> -------------------------------------------------------------------------------------------------------
>
>                 Key: SAMZA-2577
>                 URL: https://issues.apache.org/jira/browse/SAMZA-2577
>             Project: Samza
>          Issue Type: Bug
>            Reporter: Rayman
>            Priority: Major
>
> Problem: 
>  In both StreamAppender for log4j1 and log4j2 a blocking queue is used to 
> coordinate between the append()-ing threads and a single thread send()-ing to 
> Kafka.
>  This is a bounded, blocking, lock-synchronized queue.
>  To avoid deadlock scenarios (see SAMZA-1537), the append()-ing threads have 
> a timeout of 2 seconds, after which the log message is discarded and the 
> queue is drained. 
>  This means in case of message bursts, threads calling append() may block for 
> upto 2 seconds, and may continually be stuck in this pattern, leading to 
> processing stalls and lowered throughput. 
> *Solutions for Log4j2* 
>  Solution 1. Enable async logger in log4j2, since they are supported and 
> provided in log4j2.[https://logging.apache.org/log4j/2.x/manual/async.html].
>  In using this capability, the blocking-queue in StreamAppender is not 
> required because the logger itself will be asynchronous, and so append() 
> threads can directly call systemProducer.send(). 
>  However, if async loggers are not used then this queue based mechanism, to 
> give the append()-ing threads an "async" illusion, is required.
> Solution 2. Continue using the blocking bounded lock-based queue, but make 
> the queue size and timeout configurable. Users can then tune this to account 
> for message bursts.
> Solution 3. Move to use a lock-less queue, e.g., ConcurrentLinkedQueue 
> (unbounded) or 
>  implement a bounded lock-less queue, or use open-source implementations.
>  Append()-ing threads will no longer need to block or timeout. However the 
> caller may busy-wait or need a fixed-rate or fixed-sleep-time to avoid busy 
> waits, since a lock-less queue is non blocking. 
>  It uses CAS operations. 
> *For log4j2, we will adopt Solution 1.*
> *Solutions for Log4j1*
>  Solution 1. Deprecate – log4j1 is not supported. 
>  Solution 2. Similar to Solution 2 above.
>  Solution 3. Similar to Solution 3 above.
> *For log4j1, we will adopt Solution 1 – won't fix.*



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to