[ 
https://issues.apache.org/jira/browse/SAMZA-170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13921001#comment-13921001
 ] 

Martin Kleppmann commented on SAMZA-170:
----------------------------------------

The hello-samza jobs don't produce any keyed messages, so if I understand 
Kafka's model correctly, that means producers should round-robin between 
partitions. Therefore there isn't any obvious reason why all messages should 
end up in one partition.

Examining the Kafka log files (after wiping the /tmp directory and then running 
the three hello-samza jobs for about 45 mins):

{noformat}
$ ls -l /tmp/kafka-logs/*/*.log
-rw-r--r--  1 mkleppma  wheel     2171  5 Mar 15:56 
/tmp/kafka-logs/__samza_checkpoint_wikipedia-parser_1-0/00000000000000000000.log
-rw-r--r--  1 mkleppma  wheel     2783  5 Mar 15:56 
/tmp/kafka-logs/__samza_checkpoint_wikipedia-parser_1-1/00000000000000000000.log
-rw-r--r--  1 mkleppma  wheel        0  5 Mar 15:10 
/tmp/kafka-logs/metrics-0/00000000000000000000.log
-rw-r--r--  1 mkleppma  wheel   318697  5 Mar 15:56 
/tmp/kafka-logs/metrics-1/00000000000000000000.log
-rw-r--r--  1 mkleppma  wheel  1896850  5 Mar 15:50 
/tmp/kafka-logs/wikipedia-edits-0/00000000000000000000.log
-rw-r--r--  1 mkleppma  wheel  1042973  5 Mar 15:56 
/tmp/kafka-logs/wikipedia-edits-1/00000000000000000000.log
-rw-r--r--  1 mkleppma  wheel   856229  5 Mar 15:50 
/tmp/kafka-logs/wikipedia-raw-0/00000000000000000000.log
-rw-r--r--  1 mkleppma  wheel  1157847  5 Mar 15:56 
/tmp/kafka-logs/wikipedia-raw-1/00000000000000000000.log
-rw-r--r--  1 mkleppma  wheel        0  5 Mar 15:10 
/tmp/kafka-logs/wikipedia-stats-0/00000000000000000000.log
-rw-r--r--  1 mkleppma  wheel    68617  5 Mar 15:56 
/tmp/kafka-logs/wikipedia-stats-1/00000000000000000000.log
-rw-r--r--  1 mkleppma  wheel   212150  5 Mar 15:50 
/tmp/kafka-logs/wikipedia-stats-changelog-0/00000000000000000000.log
-rw-r--r--  1 mkleppma  wheel   116800  5 Mar 15:56 
/tmp/kafka-logs/wikipedia-stats-changelog-1/00000000000000000000.log
{noformat}

Looks like there's no activity on partition 0 of the wikipedia-stats and 
metrics topics?

> hello-samza wikipedia-stats job only receives messages on one partition
> -----------------------------------------------------------------------
>
>                 Key: SAMZA-170
>                 URL: https://issues.apache.org/jira/browse/SAMZA-170
>             Project: Samza
>          Issue Type: Bug
>            Reporter: Martin Kleppmann
>            Assignee: Martin Kleppmann
>
> If you run the three hello-samza jobs, and inspect the output of the 
> wikipedia-stats topic, it looks like this:
> {noformat}
> {"is-bot-edit":3,"bytes-added":2695,"edits":24,"unique-titles":24,"is-new":1,"is-minor":7}
> {"bytes-added":0,"edits":0,"unique-titles":0}
> {"is-bot-edit":3,"is-talk":1,"bytes-added":3474,"edits":19,"unique-titles":19,"is-minor":6}
> {"bytes-added":0,"edits":0,"unique-titles":0}
> {"is-bot-edit":3,"bytes-added":1794,"edits":15,"unique-titles":15,"is-new":1,"is-minor":5}
> {"bytes-added":0,"edits":0,"unique-titles":0}
> {"is-bot-edit":3,"bytes-added":118,"edits":19,"unique-titles":19,"is-new":2,"is-minor":5}
> {"bytes-added":0,"edits":0,"unique-titles":0}
> {noformat}
> Every other message has 0 edits, and two messages appear every 10 seconds, 
> suggesting that of the job's two tasks (Kafka's default partition count is 
> 2), one of the two tasks is not receiving any messages. That might be because 
> all messages are going into one partition, or because half the messages are 
> being lost, I'm not sure. Either way, it doesn't seem right. (And I'm fairly 
> sure that it wasn't this way a few weeks ago.)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to