Martin Kleppmann created SAMZA-170:
--------------------------------------
Summary: hello-samza wikipedia-stats job only receives messages on
one partition
Key: SAMZA-170
URL: https://issues.apache.org/jira/browse/SAMZA-170
Project: Samza
Issue Type: Bug
Reporter: Martin Kleppmann
Assignee: Martin Kleppmann
If you run the three hello-samza jobs, and inspect the output of the
wikipedia-stats topic, it looks like this:
{noformat}
{"is-bot-edit":3,"bytes-added":2695,"edits":24,"unique-titles":24,"is-new":1,"is-minor":7}
{"bytes-added":0,"edits":0,"unique-titles":0}
{"is-bot-edit":3,"is-talk":1,"bytes-added":3474,"edits":19,"unique-titles":19,"is-minor":6}
{"bytes-added":0,"edits":0,"unique-titles":0}
{"is-bot-edit":3,"bytes-added":1794,"edits":15,"unique-titles":15,"is-new":1,"is-minor":5}
{"bytes-added":0,"edits":0,"unique-titles":0}
{"is-bot-edit":3,"bytes-added":118,"edits":19,"unique-titles":19,"is-new":2,"is-minor":5}
{"bytes-added":0,"edits":0,"unique-titles":0}
{noformat}
Every other message has 0 edits, and two messages appear every 10 seconds,
suggesting that of the job's two tasks (Kafka's default partition count is 2),
one of the two tasks is not receiving any messages. That might be because all
messages are going into one partition, or because half the messages are being
lost, I'm not sure. Either way, it doesn't seem right. (And I'm fairly sure
that it wasn't this way a few weeks ago.)
--
This message was sent by Atlassian JIRA
(v6.2#6252)