Re: [KIP-DISCUSSION] KIP-22 Expose a Partitioner interface in the new producer

Sriharsha Chintalapani Thu, 23 Apr 2015 15:00:17 -0700

Hi Jay,
         Sorry about the KIP formatting . I fixed those in the KIP.

2. We certainly need to add both the serialized and unserialized form for 
the key as both are useful. 
I added those to the interface.

3. Do we need to add the value? I suspect people will have uses for 
computing something off a few fields in the value to choose the partition. 
This would be useful in cases where the key was being used for log 
compaction purposes and did not contain the full information for computing 
the partition. 
added it as well.

4. This interface doesn't include either an init() or close() method. It 
should implement Closable and Configurable, right? 
I am not quite sure about having init() or close() for partitioner. Are we 
looking at partitioner using some external resources to initialize and close. 
If thats the case than init() should also take in some config as param, this 
can add more complexity.

5. What happens if the user both sets the partition id in the 
ProducerRecord and sets a partitioner? Does the partition id just get 
passed in to the partitioner (as sort of implied in this interface?). This 
is a bit weird since if you pass in the partition id you kind of expect it 
to get used, right? Or is it the case that if you specify a partition the 
partitioner isn't used at all (in which case no point in including 
partition in the Partitioner api). 
In current Producer Record partition id is getting passed to Partitioner. If a 
custom partitioner is not going to use that than thats up to their 
implementation  right. Similarly in our interface we’ve Value as another param 
this may or may not be used. Essentially its up to the Partitioner to disclose 
on what available information they are going to partition against.

Thanks,
Harsha

On April 23, 2015 at 9:11:33 AM, Jay Kreps (jay.kr...@gmail.com) wrote:

Hey Harsha,  

A few comments:  

Can you finish up the KIP there are some unfinished sentences and odd  
whitespace things going on.  

Here are the questions I think we should consider:  
1. Do we need this at all given that we have the partition argument in  
ProducerRecord which gives full control? I think we do need it because this  
is a way to plug in a different partitioning strategy at run time and do it  
in a fairly transparent way.  
2. We certainly need to add both the serialized and unserialized form for  
the key as both are useful.  
3. Do we need to add the value? I suspect people will have uses for  
computing something off a few fields in the value to choose the partition.  
This would be useful in cases where the key was being used for log  
compaction purposes and did not contain the full information for computing  
the partition.  
4. This interface doesn't include either an init() or close() method. It  
should implement Closable and Configurable, right?  
5. What happens if the user both sets the partition id in the  
ProducerRecord and sets a partitioner? Does the partition id just get  
passed in to the partitioner (as sort of implied in this interface?). This  
is a bit weird since if you pass in the partition id you kind of expect it  
to get used, right? Or is it the case that if you specify a partition the  
partitioner isn't used at all (in which case no point in including  
partition in the Partitioner api).  

Cheers,  

-Jay  

On Thu, Apr 23, 2015 at 6:55 AM, Sriharsha Chintalapani <ka...@harsha.io>  
wrote:  

> Hi,  
> Here is the KIP for adding a partitioner interface for producer.  
>  
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-+22+-+Expose+a+Partitioner+interface+in+the+new+producer
>   
> There is one open question about how interface should look like. Please  
> take a look and let me know if you prefer one way or the other.  
>  
> Thanks,  
> Harsha  
>  
>

Re: [KIP-DISCUSSION] KIP-22 Expose a Partitioner interface in the new producer

Reply via email to