[ 
https://issues.apache.org/jira/browse/KAFKA-6258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dong Lin updated KAFKA-6258:
----------------------------
    Description: 
When consumer uses plaintext and there is remaining data in consumer's buffer, 
consumer.poll() will read all data available from the socket buffer to consumer 
buffer. However, if consumer uses ssl and there is remaining data, 
consumer.poll() may only read 16 KB (the size of 
SslTransportLayer.appReadBuffer) from socket buffer. This will reduce efficient 
of consumer.poll() by asking user to call more poll() to get the same amount of 
data. 

Furthermore, we observe that for users who naively sleep a constant time after 
each consumer.poll(), some partition will lag behind after they switch from 
plaintext to ssl. Here is the explanation why this can happen.

Say there are 1 partition of 1MB/sec and 9 partition of 32KB/sec. Leaders of 
these partitions are all different and consumer is consuming these 10 
partitions. Let's also assume that socket read buffer size is large enough and 
consume sleeps 1 sec between consumer.poll(). 1 sec is long enough for consumer 
to receive the FetchResponse back from broker.

- When consumer uses plaintext, each consumer.poll() will read all data from 
the socket buffer and it means 1 MB data is read from each partition.

- When consumer uses ssl, each consumer.poll() is likely to find that there is 
some data available in the memory. In this case consumer only reads 16 KB data 
from other sockets, particularly the socket for the broker with the large 
partition. Then the throughput of the large partition will be limited to 
16KB/sec.

Arguably user should not sleep 1 sec if its consumer is lagging behind. But on 
Kafka dev side it is nice to keep the previous behavior and optimize 
consumer.poll() to read as much data from socket as possible.


  was:
When consumer uses plaintext and there is remaining data in consumer's buffer, 
consumer.poll() will read all data available from the socket buffer to consumer 
buffer. However, if consumer uses ssl and there is remaining data, 
consumer.poll() may only read 16 KB (the size of 
SslTransportLayer.appReadBuffer) from socket buffer. This will reduce efficient 
of consumer.poll() by asking user to call more poll() to get the same amount of 
data. 

Furthermore, we observe that for users who naively sleep a constant time after 
each consumer.poll(), some partition will lag behind after they switch from 
plaintext to ssl. Here is the explanation why this can happen.

Say there are 1 partition of 1MB/sec and 9 partition of 32KB/sec. Leaders of 
these partitions are all different and consumer is consuming these 10 
partitions. Let's also assume that socket read buffer size is large enough and 
consume sleeps 1 sec between consumer.poll(). 1 sec is long enough for consumer 
to receive the FetchResponse back from broker.

When consumer uses plaintext, each consumer.poll() will read all data from the 
socket buffer and it means 1 MB data is read from each partition.

When consumer uses ssl, each consumer.poll() is likely to find that there is 
some data available in the memory. In this case consumer only reads 16 KB data 
from other sockets, particularly the socket for the broker with the large 
partition. Then the throughput of the large partition will be limited to 
16KB/sec.

Arguably user should not sleep 1 sec if its consumer is lagging behind. But on 
Kafka dev side it is nice to keep the previous behavior and optimize 
consumer.poll() to read as much data from socket as possible.



> SSLTransportLayer should keep reading from socket until either the buffer is 
> full or the socket has no more data
> ----------------------------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-6258
>                 URL: https://issues.apache.org/jira/browse/KAFKA-6258
>             Project: Kafka
>          Issue Type: Improvement
>            Reporter: Dong Lin
>            Assignee: Dong Lin
>
> When consumer uses plaintext and there is remaining data in consumer's 
> buffer, consumer.poll() will read all data available from the socket buffer 
> to consumer buffer. However, if consumer uses ssl and there is remaining 
> data, consumer.poll() may only read 16 KB (the size of 
> SslTransportLayer.appReadBuffer) from socket buffer. This will reduce 
> efficient of consumer.poll() by asking user to call more poll() to get the 
> same amount of data. 
> Furthermore, we observe that for users who naively sleep a constant time 
> after each consumer.poll(), some partition will lag behind after they switch 
> from plaintext to ssl. Here is the explanation why this can happen.
> Say there are 1 partition of 1MB/sec and 9 partition of 32KB/sec. Leaders of 
> these partitions are all different and consumer is consuming these 10 
> partitions. Let's also assume that socket read buffer size is large enough 
> and consume sleeps 1 sec between consumer.poll(). 1 sec is long enough for 
> consumer to receive the FetchResponse back from broker.
> - When consumer uses plaintext, each consumer.poll() will read all data from 
> the socket buffer and it means 1 MB data is read from each partition.
> - When consumer uses ssl, each consumer.poll() is likely to find that there 
> is some data available in the memory. In this case consumer only reads 16 KB 
> data from other sockets, particularly the socket for the broker with the 
> large partition. Then the throughput of the large partition will be limited 
> to 16KB/sec.
> Arguably user should not sleep 1 sec if its consumer is lagging behind. But 
> on Kafka dev side it is nice to keep the previous behavior and optimize 
> consumer.poll() to read as much data from socket as possible.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to