[jira] [Commented] (KAFKA-3334) First message on new topic not actually being sent, no exception thrown

Jiangjie Qin (JIRA) Thu, 31 Mar 2016 13:21:46 -0700

    [ 
https://issues.apache.org/jira/browse/KAFKA-3334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15220604#comment-15220604
 ]


Jiangjie Qin commented on KAFKA-3334:
-------------------------------------

[~singhashish] I think we are on the same page that we want to let user have a 
clear idea where to look at if something goes wrong. In terms of documentation, 
it is probably extremely difficult to document all the possible scenario user 
might see because we have so many different configuration combinations and each 
combination might result in different behaviors. Documentation based on 
scenario might never be enough:) I was thinking about the following:

1. The default configurations should just work out of the box in general if 
user does not change any configurations.
2. For each configuration, we need to document clearly what this configuration 
is for and what are the possible impact.
3. For each exception thrown to user, a clear yet brief message about what was 
wrong should be in the error message itself. In the documentation of the 
exceptions, we can list the possible places this exception is thrown (this 
should match the info in the error message) and what is the possible cause as 
well as suggested solution.

Taking this particular case as an example, in the TimeoutException thrown from 
producer.send() user will see 
{noformat}
"The producer failed to fetch the metadata for the topic XXX after XXX ms. 
Please see the exception documentation for possible cause."
{noformat}

And the documentation of TimeoutException should have something like 
{noformat}
"This exception can be thrown in the following cases:
1. The producer cannot fetch the metadata of a topic. This only happens when 
the producer is sending message to the topic for the first time. It is more 
likely to happen if the topic did not exist on the brokers. The new topic 
creation on the broker might take some time. User can retry send the message in 
this case.
2. blah blah blah"
{noformat}

I feel this is more intuitive for the users to get an idea about what went 
wrong because at the end of the day, the first thing user will see is the 
exception. If the exception itself does not provide clear pointer, users do not 
know where to start. For example, if user see TimeoutException, what are they 
supposed to search or read?

So my point is that we should provide crystal clear message in the exception 
itself, through both error message and documentation.

I agree that it might also be useful if we provide the detail on how 
KafkaProducer sends the message. But it seems for users really care about the 
internal details, reading the code is probably the best way.

> First message on new topic not actually being sent, no exception thrown
> -----------------------------------------------------------------------
>
>                 Key: KAFKA-3334
>                 URL: https://issues.apache.org/jira/browse/KAFKA-3334
>             Project: Kafka
>          Issue Type: Bug
>    Affects Versions: 0.9.0.0
>         Environment: Linux, Java
>            Reporter: Aleksandar Stojadinovic
>            Assignee: Ashish K Singh
>             Fix For: 0.10.1.0
>
>
> Although I've seen this issue pop around the internet in a few forms, I'm not 
> sure it is yet properly fixed. 
> When publishing to a new topic, with auto create-enabled, the java client 
> (0.9.0) shows this WARN message in the log, and the message is not sent 
> obviously:
> org.apache.kafka.clients.NetworkClient - Error while fetching metadata with 
> correlation id 0 : {file.topic=LEADER_NOT_AVAILABLE}
> In the meantime I see in the console the message that a log for partition is 
> created. The next messages are patched through normally, but the first one is 
> never sent. No exception is ever thrown, either by calling get on the future, 
> or with the async usage, like everything is perfect.
> I notice when I leave my application blocked on the get call, in the 
> debugger, then the message may be processed, but with significant delay. This 
> is consistent with another issue I found for the python client. Also, if I 
> call partitionsFor previously, the topic is created and the message is sent. 
> But it seems silly to call it every time, just to mitigate this issue.
> {code}
> Future<RecordMetadata> recordMetadataFuture = producer.send(new 
> ProducerRecord<>(topic, key, file));
>             RecordMetadata recordMetadata = recordMetadataFuture.get(30, 
> TimeUnit.SECONDS);
> {code}
> I hope I'm clear enough.
> Related similar (but not same) issues:
> https://issues.apache.org/jira/browse/KAFKA-1124
> https://github.com/dpkp/kafka-python/issues/150
> http://stackoverflow.com/questions/35187933/how-to-resolve-leader-not-available-kafka-error-when-trying-to-consume



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KAFKA-3334) First message on new topic not actually being sent, no exception thrown

Reply via email to