Wang, That did it. Thanks a lot.
- Shekar On Thu, May 14, 2015 at 10:38 AM, Guozhang Wang <wangg...@gmail.com> wrote: > Hi Shekar, > > It seems the incoming / outgoing topics are not the root of the problem > here, but the checkpoint topic "__samza_checkpoint_ver_1_for_Argos". From > the error logs this topic only has one replica 1018019532, which was down > and hence not available. > > Guozhang > > On Thu, May 14, 2015 at 5:16 AM, Shekar Tippur <ctip...@gmail.com> wrote: > > > Here is what I see on Kafka log: > > > > [2015-05-14 04:11:27,752] ERROR Closing socket for /10.180.195.32 > because > > of error (kafka.network.Processor) > > > > java.io.IOException: Connection reset by peer > > > > at sun.nio.ch.FileDispatcherImpl.read0(Native Method) > > > > at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) > > > > at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) > > > > at sun.nio.ch.IOUtil.read(IOUtil.java:197) > > > > at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379) > > > > at kafka.utils.Utils$.read(Utils.scala:375) > > > > at > > > > > kafka.network.BoundedByteBufferReceive.readFrom(BoundedByteBufferReceive.scala:54) > > > > at kafka.network.Processor.read(SocketServer.scala:347) > > > > at kafka.network.Processor.run(SocketServer.scala:245) > > > > at java.lang.Thread.run(Thread.java:745) > > > > [2015-05-14 04:11:27,753] INFO Closing socket connection to / > 10.180.195.32 > > . > > (kafka.network.Processor) > > > > [2015-05-14 04:16:06,537] INFO Closing socket connection to / > 10.180.195.32 > > . > > (kafka.network.Processor) > > > > [2015-05-14 04:16:06,604] INFO Closing socket connection to / > 10.180.195.32 > > . > > (kafka.network.Processor) > > > > [2015-05-14 04:16:32,370] INFO Closing socket connection to / > 10.180.195.33 > > . > > (kafka.network.Processor) > > > > [2015-05-14 04:16:32,452] INFO Closing socket connection to / > 10.180.195.33 > > . > > (kafka.network.Processor) > > > > [2015-05-14 04:16:32,810] INFO Closing socket connection to / > 10.180.195.33 > > . > > (kafka.network.Processor) > > > > [2015-05-14 04:16:32,931] INFO Closing socket connection to / > 10.180.195.33 > > . > > (kafka.network.Processor) > > > > [2015-05-14 04:36:40,586] INFO Closing socket connection to / > 10.180.195.33 > > . > > (kafka.network.Processor) > > > > [2015-05-14 04:39:49,016] INFO Closing socket connection to / > 10.180.195.33 > > . > > (kafka.network.Processor) > > > > [2015-05-14 04:43:38,166] INFO Closing socket connection to / > 10.180.195.32 > > . > > (kafka.network.Processor) > > > > [2015-05-14 04:43:38,392] INFO [ReplicaFetcherManager on broker > 1018019533] > > Removed fetcher for partitions [argos-parser,0],[argos-raw,0] > > (kafka.server.ReplicaFetcherManager) > > > > [2015-05-14 04:43:40,746] INFO Closing socket connection to / > 10.180.195.33 > > . > > (kafka.network.Processor) > > > > [2015-05-14 04:43:40,855] INFO Closing socket connection to / > 10.180.195.33 > > . > > (kafka.network.Processor) > > > > [2015-05-14 04:43:40,957] INFO Closing socket connection to / > 10.180.195.33 > > . > > (kafka.network.Processor) > > > > On Thu, May 14, 2015 at 4:55 AM, Shekar Tippur <ctip...@gmail.com> > wrote: > > > > > Here is the complete log: > > > > > > http://pastebin.com/nX7twETm > > > > > > Interesting, I see a leader not available exception instead of the > > earlier > > > one. > > > > > > > ./container_1431601903660_0001_01_000002/samza-container-0.log:2015-05-14 > > > 04:53:41 BrokerPartitionInfo [WARN] Error while fetching metadata > > partition > > > 0 leader: none replicas: 1018019532 (sprdargas402.corp.intuit.net:6667 > ) > > isr: > > > isUnderReplicated: true for topic partition > > > [__samza_checkpoint_ver_1_for_Argos_1,0]: [class > > > kafka.common.LeaderNotAvailableException] > > > > > > - Shekar > > > > > > On Wed, May 13, 2015 at 7:52 PM, Naveen S <navg...@gmail.com> wrote: > > > > > >> Hey Shekar, > > >> Can you paste the entire stacktrace/log? Where there any other errors > ? > > >> On Wed, May 13, 2015 at 6:04 PM Shekar Tippur <ctip...@gmail.com> > > wrote: > > >> > > >> > Hello, > > >> > > > >> > I seem to come across a issue with replication. We have 2 nodes > where > > >> Kafka > > >> > and yarn run. > > >> > > > >> > We have enabled replication factor on Kafka (Replication factor = > 2). > > >> For > > >> > testing redundancy, we shutdown broker01 server. > > >> > On the yarn application logs, we see the > > >> > exception kafka.common.ReplicaNotAvailableException > > >> > > > >> > Incoming topic: > > >> > > > >> > /opt/kafka/bin/kafka-topics.sh --zookeeper localhost:2181 --topic > raw > > >> > --describe > > >> > > > >> > Topic:raw PartitionCount:1 ReplicationFactor:2 Configs: > > >> > > > >> > Topic: argos-raw Partition: 0 Leader: 1018019533 Replicas: > > >> > 1018019533,1018019532 Isr: 1018019533,1018019532 > > >> > > > >> > Out going topic: > > >> > > > >> > /opt/kafka/bin/kafka-topics.sh --zookeeper localhost:2181 --topic > > >> parser > > >> > --describe > > >> > > > >> > Topic:parser PartitionCount:1 ReplicationFactor:2 Configs: > > >> > > > >> > Topic: argos-parser Partition: 0 Leader: 1018019533 Replicas: > > >> > 1018019533,1018019532 Isr: 1018019533,1018019532 > > >> > > > >> > Any idea on why this could be happening? > > >> > > > >> > - Shekar > > >> > > > >> > > > > > > > > > > > > -- > -- Guozhang >