[ 
https://issues.apache.org/jira/browse/KAFKA-2078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14681680#comment-14681680
 ] 

PC commented on KAFKA-2078:
---------------------------

I can reproduce this bug though it appears to be a challenge to do so.
Running on Mac OS X 10.9.5 16GB Ram
Java version 1.8.0_40

It only appears to affect the Producer; 
org.apache.kafka.clients.producer.KafkaProducer 0.8.2.1

Setup:

3 Producers pumping test data to one kafka-server, with 1 replica, all running 
locally on the same machine. Each producer using the async 
.send(producerRecord, callBack) method.
The configs will be at the bottom of this post.

Here is a log snippet:

16:21:51.527 [message-consumer-akka.actor.default-dispatcher-5] DEBUG producer 
- PumpSuccess topic: test partition 0 offset: 3330477
16:21:51.528 [message-consumer-akka.actor.default-dispatcher-5] DEBUG producer 
- PumpSuccess topic: test partition 0 offset: 3330478
16:21:51.528 [message-consumer-akka.actor.default-dispatcher-5] DEBUG producer 
- PumpSuccess topic: test partition 0 offset: 3330479
16:21:51.528 [message-consumer-akka.actor.default-dispatcher-5] DEBUG producer 
- PumpSuccess topic: test partition 0 offset: 3330480
16:21:51.528 [message-consumer-akka.actor.default-dispatcher-5] DEBUG producer 
- PumpSuccess topic: test partition 0 offset: 3330481
16:26:26.220 [kafka-producer-network-thread | producer-3] WARN  
o.a.kafka.common.network.Selector - Error in I/O with localhost/127.0.0.1
java.io.EOFException: null
        at 
org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:62) 
~[kafka-clients-0.8.2.1.jar:na]
        at org.apache.kafka.common.network.Selector.poll(Selector.java:248) 
~[kafka-clients-0.8.2.1.jar:na]
        at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:192) 
[kafka-clients-0.8.2.1.jar:na]
        at 
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:191) 
[kafka-clients-0.8.2.1.jar:na]
        at 
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:122) 
[kafka-clients-0.8.2.1.jar:na]
        at java.lang.Thread.run(Thread.java:745) [na:1.8.0_40]
16:26:26.220 [kafka-producer-network-thread | producer-2] WARN  
o.a.kafka.common.network.Selector - Error in I/O with localhost/127.0.0.1
java.io.EOFException: null
        at 
org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:62) 
~[kafka-clients-0.8.2.1.jar:na]
        at org.apache.kafka.common.network.Selector.poll(Selector.java:248) 
~[kafka-clients-0.8.2.1.jar:na]
        at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:192) 
[kafka-clients-0.8.2.1.jar:na]
        at 
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:191) 
[kafka-clients-0.8.2.1.jar:na]
        at 
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:122) 
[kafka-clients-0.8.2.1.jar:na]
        at java.lang.Thread.run(Thread.java:745) [na:1.8.0_40]
16:26:26.220 [kafka-producer-network-thread | producer-1] WARN  
o.a.kafka.common.network.Selector - Error in I/O with localhost/127.0.0.1
java.io.EOFException: null
        at 
org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:62) 
~[kafka-clients-0.8.2.1.jar:na]
        at org.apache.kafka.common.network.Selector.poll(Selector.java:248) 
~[kafka-clients-0.8.2.1.jar:na]
        at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:192) 
[kafka-clients-0.8.2.1.jar:na]
        at 
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:191) 
[kafka-clients-0.8.2.1.jar:na]
        at 
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:122) 
[kafka-clients-0.8.2.1.jar:na]
        at java.lang.Thread.run(Thread.java:745) [na:1.8.0_40]

Pay attention to the timestamps. Less than 5 minutes after the producers were 
FINISHED pumping the data, these 3 exceptions were logged by the kafka-producer 
internals.

The worst is, this bug also occurred while pumping messages to the broker, 2 
days ago. The CallBack code was not called for 3 messages ( 1 per producer ) 
when this bug kicked-in nor was an exception thrown in my application. This can 
potentially lead to serious data loss and has severe implications.

I would in a heartbeat upgrade this bug as SEVERE/CRITICAL and not Major.

Temporary (unacceptable) solution is to block with a timeout to ensure we 
didn't lose data when this bug manifests itself:
try {
....
kafkaProducer.send(record, callBack).get(5, TimeUnit.SECONDS)
} catch {
 ....
}

This approach reduces the pumping throughput down to roughly ~5k messages/sec, 
from ~60k messages/sec using the async, for a single producer.

Config properties:

Kafka-Server:
broker.id=0
port=9092
num.network.threads=3
num.io.threads=8
socket.send.buffer.bytes=102400
socket.receive.buffer.bytes=102400
socket.request.max.bytes=104857600
log.dirs=/tmp/kafka-logs
num.partitions=1
num.recovery.threads.per.data.dir=1
#log.flush.interval.messages=10000
log.flush.interval.ms=5000
delete.topic.enable=true
log.retention.hours=2147483640
log.segment.bytes=1073741824
log.retention.check.interval.ms=30000000
log.cleaner.enable=false
zookeeper.connect=localhost:2181
zookeeper.connection.timeout.ms=12000
offsets.topic.retention.minutes=28800
offset.metadata.max.bytes=4096
offsets.topic.num.partitions=50
offsets.retention.check.interval.ms=600000
offsets.topic.replication.factor=3
offsets.topic.segment.bytes=104857600
offsets.load.buffer.size=5242880
offsets.commit.required.acks=-1
offsets.commit.timeout.ms=5000
default.replication.factor=1
num.partitions=1
auto.create.topics.enable=true
unclean.leader.election.enable=false

Zookeeper:
dataDir=/tmp/zookeeper
clientPort=2181
maxClientCnxns=0

Producer:
kafkaProducerProps.put(ProducerConfig.ACKS_CONFIG, "1")
kafkaProducerProps.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, 
"127.0.0.1:9092")
kafkaProducerProps.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, 
classOf[StringSerializer].getName)
kafkaProducerProps.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, 
classOf[StringSerializer].getName)

Is it possible for anyone to seriously look into this problem? It really does 
exist.

> Getting Selector [WARN] Error in I/O with host java.io.EOFException
> -------------------------------------------------------------------
>
>                 Key: KAFKA-2078
>                 URL: https://issues.apache.org/jira/browse/KAFKA-2078
>             Project: Kafka
>          Issue Type: Bug
>          Components: producer 
>    Affects Versions: 0.8.2.0
>         Environment: OS Version: 2.6.39-400.209.1.el5uek and Hardware: 8 x 
> Intel(R) Xeon(R) CPU X5660  @ 2.80GHz/44GB
>            Reporter: Aravind
>            Assignee: Jun Rao
>
> When trying to Produce 1000 (10 MB) messages, getting this below error some 
> where between 997 to 1000th message. There is no pattern but able to 
> reproduce.
> [PDT] 2015-03-31 13:53:50 Selector [WARN] Error in I/O with "our host" 
> java.io.EOFException at 
> org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:62)
>  at org.apache.kafka.common.network.Selector.poll(Selector.java:248) at 
> org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:192) at 
> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:191) at 
> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:122) at 
> java.lang.Thread.run(Thread.java:724)
> This error I am getting some times @ 997th message or 999th message. There is 
> no pattern but able to reproduce.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to