Re: [DISCUSS] KIP-349 Priorities for Source Topics

Matthias J. Sax Thu, 13 Sep 2018 11:28:33 -0700

I think, we can keep incremental fetch request with the design idea I
described in my previous email. Of course, brokers would need to be
updated and understand topic priorities, ie, we would also need to
change the protocol to send topic priority information to the brokers.


Thus, if somebody only updates clients and uses topic priorities, it
would need to fall back to full fetch requests until brokers are
updated, too.

Thoughts?


-Matthias

On 9/10/18 11:08 AM, Colin McCabe wrote:
> Oh, also, I am -1 on disabling incremental fetch requests when the 
> prioritization feature is used.  Users often have performance problems that 
> are difficult to understand when they use various combinations of features.  
> Of course as implmentors we "know" what the right combination of features is 
> to get good performance.  But this is not intuitive to our users.  If we 
> support a feature, it should not cause non-obvious performance regressions.
> 
> best,
> Colin
> 
> 
> On Mon, Sep 10, 2018, at 11:05, Colin McCabe wrote:
>> On Thu, Sep 6, 2018, at 05:24, Jan Filipiak wrote:
>>>
>>> On 05.09.2018 17:18, Colin McCabe wrote:
>>>> Hi all,
>>>>
>>>> I agree that DISCUSS is more appropriate than VOTE at this point, since I 
>>>> don't remember the last discussion coming to a definite conclusion.
>>>>
>>>> I guess my concern is that this will add complexity and memory consumption 
>>>> on the server side.  In the case of incremental fetch requests, we will 
>>>> have to track at least two extra bytes per partition, to know what the 
>>>> priority of each partition is within each active fetch session.
>>>>
>>>> It would be nice to hear more about the use-cases for this feature.  I 
>>>> think Gwen asked about this earlier, and I don't remember reading a 
>>>> response.  The fact that we're now talking about Samza interfaces is a bit 
>>>> of a red flag.  After all, Samza didn't need partition priorities to do 
>>>> what it did.  You can do a lot with muting partitions and using 
>>>> appropriate threading in your code.
>>> to show a usecase, I linked 353, especially since the threading model is 
>>> pretty fixed there.
>>>
>>> No clue why Samza should be a red flag. They handle purely on the 
>>> consumer side. Which I think is reasonable. I would not try to implement 
>>> any broker side support for this, if I were todo it. Just don't support 
>>> incremental fetch then.
>>> In the end if you have broker side support, you would need to ship the 
>>> logic of the message chooser to the broker. I don't think that will 
>>> allow for the flexibility I had in mind on purely consumer based 
>>> implementation.
>>
>> The reason why Samza is a red flag here is that if Samza can do a thing, 
>> any Kafka client can do that thing too, and no special broker support is 
>> needed.  In that case, perhaps what we need is better documentation, 
>> rather than any new features or code in Kafka itself.  Also, it's not 
>> clear why we should duplicate Samza in Kafka.  Users who want to use 
>> Samza can still simply use Samza, or one of the various opinionated 
>> frameworks like Kafka Streams, KSQL, etc. etc.
>>
>> In general I think we should not continue this discussion until we have 
>> some examples of concrete and specific use-cases.
>>
>> best,
>> Colin
>>
>>
>>>
>>>
>>>>
>>>> For example, you can hand data from a partition off to a work queue with a 
>>>> fixed size, which is handled by a separate service thread.  If the queue 
>>>> gets full, you can mute the partition until some of the buffered data is 
>>>> processed.  Kafka Streams uses a similar approach to avoid reading 
>>>> partition data that isn't immediately needed.
>>>>
>>>> There might be some use-cases that need priorities eventually, but I'm 
>>>> concerned that we're jumping the gun by trying to implement this before we 
>>>> know what they are.
>>>>
>>>> best,
>>>> Colin
>>>>
>>>>
>>>> On Wed, Sep 5, 2018, at 01:06, Jan Filipiak wrote:
>>>>> On 05.09.2018 02:38, n...@afshartous.com wrote:
>>>>>>> On Sep 4, 2018, at 4:20 PM, Jan Filipiak <jan.filip...@trivago.com> 
>>>>>>> wrote:
>>>>>>>
>>>>>>> what I meant is litterally this interface:
>>>>>>>
>>>>>>> https://samza.apache.org/learn/documentation/0.7.0/api/javadocs/org/apache/samza/system/chooser/MessageChooser.html
>>>>>>>  
>>>>>>> <https://samza.apache.org/learn/documentation/0.7.0/api/javadocs/org/apache/samza/system/chooser/MessageChooser.html>
>>>>>> Hi Jan,
>>>>>>
>>>>>> Thanks for the reply and I have a few questions.  This Samza doc
>>>>>>
>>>>>>     
>>>>>> https://samza.apache.org/learn/documentation/0.14/container/streams.html 
>>>>>> <https://samza.apache.org/learn/documentation/0.14/container/streams.html>
>>>>>>
>>>>>> indicates that the chooser is set via configuration.  Are you suggesting 
>>>>>> adding a new configuration for Kafka ?  Seems like we could also have a 
>>>>>> method on KafkaConsumer
>>>>>>
>>>>>>       public void register(MessageChooser messageChooser)
>>>>> I don't have strong opinions regarding this. I like configs, i also
>>>>> don't think it would be a problem to have both.
>>>>>
>>>>>> to make it more dynamic.
>>>>>>
>>>>>> Also, the Samza MessageChooser interface has method
>>>>>>
>>>>>>     /* Notify the chooser that a new envelope is available for a 
>>>>>> processing. */
>>>>>> void update(IncomingMessageEnvelope envelope)
>>>>>>
>>>>>> and I’m wondering how this method would be translated to Kafka API.  In 
>>>>>> particular what corresponds to IncomingMessageEnvelope.
>>>>> I think Samza uses the envelop abstraction as they support other sources
>>>>> besides kafka aswell. They are more
>>>>> on the spark end of things when it comes to different input types. I
>>>>> don't have strong opinions but it feels like
>>>>> we wouldn't need such a thing in the kafka consumer but just use a
>>>>> regular ConsumerRecord or so.
>>>>>> Best,
>>>>>> --
>>>>>>         Nick
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>

signature.asc
Description: OpenPGP digital signature

Re: [DISCUSS] KIP-349 Priorities for Source Topics

Reply via email to