[
https://issues.apache.org/jira/browse/KAFKA-691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13548667#comment-13548667
]
Maxime Brugidou commented on KAFKA-691:
---------------------------------------
I think the work-around is not really acceptable for me since it will consume
3x the resources (because replication of 3 is the minimum acceptable) and it
will still make the cluster less available anyway (unless i have only 3
brokers).
The thing is that 0.7 was making the cluster 100% available (for my use case,
accepting data loss) as long a single broker was alive.
A way to handle this would be to:
1. Have a lot of partitions per topic (more than the # of brokers)
2. Have something that rebalances the partitions and make sure a broker has a
at least a partition for each topic (to make every topic "available")
3. Have a setting in the consumer/producer that say "I don't care about
partitioning, just produce/consume wherever you can"
> Fault tolerance broken with replication factor 1
> ------------------------------------------------
>
> Key: KAFKA-691
> URL: https://issues.apache.org/jira/browse/KAFKA-691
> Project: Kafka
> Issue Type: Bug
> Affects Versions: 0.8
> Reporter: Jay Kreps
>
> In 0.7 if a partition was down we would just send the message elsewhere. This
> meant that the partitioning was really more of a "stickiness" then a hard
> guarantee. This made it impossible to depend on it for partitioned, stateful
> processing.
> In 0.8 when running with replication this should not be a problem generally
> as the partitions are now highly available and fail over to other replicas.
> However in the case of replication factor = 1 no longer really works for most
> cases as now a dead broker will give errors for that broker.
> I am not sure of the best fix. Intuitively I think this is something that
> should be handled by the Partitioner interface. However currently the
> partitioner has no knowledge of which nodes are available. So you could use a
> random partitioner, but that would keep going back to the down node.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira