Including KAFKA-3907 sounds reasonable to me.

-Matthias

On 6/14/17 2:29 PM, Guozhang Wang wrote:
> LGTM. Thanks!
> 
> 
> Guozhang
> 
> On Tue, Jun 13, 2017 at 2:20 PM, Jeyhun Karimov <je.kari...@gmail.com>
> wrote:
> 
>> Thanks for the comment Matthias. After all the discussion (thanks to all
>> participants), I think this (single method that passes in a RecordContext
>> object) is the best alternative.
>> Just a side note: I think KAFKA-3907 [1] can also be integrated into the
>> KIP by adding related method inside RecordContext interface.
>>
>>
>> [1] https://issues.apache.org/jira/browse/KAFKA-3907
>>
>>
>> Cheers,
>> Jeyhun
>>
>> On Tue, Jun 13, 2017 at 7:50 PM Matthias J. Sax <matth...@confluent.io>
>> wrote:
>>
>>> Hi,
>>>
>>> I would like to push this discussion further. It seems we got nice
>>> alternatives (thanks for the summary Jeyhun!).
>>>
>>> With respect to RichFunctions and allowing them to be stateful, I have
>>> my doubt as expressed already. From my understanding, the idea was to
>>> give access to record metadata information only. If you want to do a
>>> stateful computation you should rather use #transform().
>>>
>>> Furthermore, as pointed out, we would need to switch to a
>>> supplier-pattern introducing many more overloads.
>>>
>>> For those reason, I advocate for a simple interface with a single method
>>> that passes in a RecordContext object.
>>>
>>>
>>> -Matthias
>>>
>>>
>>> On 6/6/17 5:15 PM, Guozhang Wang wrote:
>>>> Thanks for the comprehensive summary!
>>>>
>>>> Personally I'd prefer the option of passing RecordContext as an
>>> additional
>>>> parameter into he overloaded function. But I'm also open to other
>>> arguments
>>>> if there are sth. that I have overlooked.
>>>>
>>>> Guozhang
>>>>
>>>>
>>>> On Mon, Jun 5, 2017 at 3:19 PM, Jeyhun Karimov <je.kari...@gmail.com>
>>> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> Thanks for your comments Matthias and Guozhang.
>>>>>
>>>>> Below I mention the quick summary of the main alternatives we looked
>> at
>>> to
>>>>> introduce the Rich functions (I will refer to it as Rich functions
>>> until we
>>>>> find better/another name). Initially the proposed alternatives was not
>>>>> backwards-compatible, so I will not mention them.
>>>>> The related discussions are spread in KIP-149 and in this KIP
>> (KIP-159)
>>>>> discussion threads.
>>>>>
>>>>>
>>>>>
>>>>> 1. The idea of rich functions came into the stage with KIP-149, in
>>>>> discussion thread. As a result we extended KIP-149 to support Rich
>>>>> functions as well.
>>>>>
>>>>> 2.  To as part of the Rich functions, we provided init
>>> (ProcessorContext)
>>>>> method. Afterwards, Dammian suggested that we should not provide
>>>>> ProcessorContext to users. As a result, we separated the two problems
>>> into
>>>>> two separate KIPs, as it seems they can be solved in parallel.
>>>>>
>>>>> - One approach we considered was :
>>>>>
>>>>> public interface ValueMapperWithKey<K, V, VR> {
>>>>>     VR apply(final K key, final V value);
>>>>> }
>>>>>
>>>>> public interface RichValueMapper<K, V, VR> extends RichFunction{
>>>>> }
>>>>>
>>>>> public interface RichFunction {
>>>>>     void init(RecordContext recordContext);
>>>>>     void close();
>>>>> }
>>>>>
>>>>> public interface RecordContext {
>>>>>     String applicationId();
>>>>>     TaskId taskId();
>>>>>     StreamsMetrics metrics();
>>>>>     String topic();
>>>>>     int partition();
>>>>>     long offset();
>>>>>     long timestamp();
>>>>>     Map<String, Object> appConfigs();
>>>>>     Map<String, Object> appConfigsWithPrefix(String prefix);
>>>>> }
>>>>>
>>>>>
>>>>> public interface ProcessorContext extends RecordContext {
>>>>>    // all methods but the ones in RecordContext
>>>>> }
>>>>>
>>>>> As a result:
>>>>> * . All "withKey" and "withoutKey" interfaces can be converted to
>> their
>>>>> Rich counterparts (with empty init() and close() methods)
>>>>> *. All related Processors will accept Rich interfaces in their
>>>>> constructors.
>>>>> *. So, we convert the related "withKey" or "withoutKey" interfaces to
>>> Rich
>>>>> interface while building the topology and initialize the related
>>> processors
>>>>> with Rich interfaces only.
>>>>> *. We will not need to overloaded methods for rich functions as Rich
>>>>> interfaces extend withKey interfaces. We will just check the object
>> type
>>>>> and act accordingly.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> 3. There was some thoughts that the above approach does not support
>>> lambdas
>>>>> so we should support only one method, only init(RecordContext), as
>> part
>>> of
>>>>> Rich interfaces.
>>>>> This is still in discussion. Personally I think Rich interfaces are by
>>>>> definition lambda-free and we should not care much about it.
>>>>>
>>>>>
>>>>> 4. Thanks to Matthias's discussion, an alternative we considered was
>> to
>>>>> pass in the RecordContext as method parameter.  This might even allow
>> to
>>>>> use Lambdas and we could keep the name RichFunction as we preserve the
>>>>> nature of being a function.
>>>>> "If you go with `init()` and `close()` we basically
>>>>> allow users to have an in-memory state for a function. Thus, we cannot
>>>>> share a single instance of RichValueMapper (etc) over multiple tasks
>> and
>>>>> we would need a supplier pattern similar to #transform(). And this
>> would
>>>>> "break the flow" of the API, as (Rich)ValueMapperSupplier would not
>>>>> inherit from ValueMapper and thus we would need many new overload for
>>>>> KStream/KTable classes". (Copy paste from Matthias's email)
>>>>>
>>>>>
>>>>> Cheers,
>>>>> Jeyhun
>>>>>
>>>>>
>>>>> On Mon, Jun 5, 2017 at 5:18 AM Matthias J. Sax <matth...@confluent.io
>>>
>>>>> wrote:
>>>>>
>>>>>> Yes, we did consider this, and there is no consensus yet what the
>> best
>>>>>> alternative is.
>>>>>>
>>>>>> @Jeyhun: the email thread got pretty long. Maybe you can give a quick
>>>>>> summary of the current state of the discussion?
>>>>>>
>>>>>>
>>>>>> -Matthias
>>>>>>
>>>>>> On 6/4/17 6:04 PM, Guozhang Wang wrote:
>>>>>>> Thanks for the explanation Jeyhun and Matthias.
>>>>>>>
>>>>>>> I have just read through both KIP-149 and KIP-159 and am wondering
>> if
>>>>> you
>>>>>>> guys have considered a slight different approach for rich function,
>>>>> that
>>>>>> is
>>>>>>> to add the `RecordContext` into the apply functions as an additional
>>>>>>> parameter. For example:
>>>>>>>
>>>>>>> ---------------------------
>>>>>>>
>>>>>>> interface RichValueMapper<V, VR> {
>>>>>>>
>>>>>>> VR apply(final V value, final RecordContext context);
>>>>>>>
>>>>>>> }
>>>>>>>
>>>>>>> ...
>>>>>>>
>>>>>>> // then in KStreams
>>>>>>>
>>>>>>> <VR> KStream<K, VR> mapValues(ValueMapper<? super V, ? extends VR>
>>>>>> mapper);
>>>>>>> <VR> KStream<K, VR> mapValueswithContext(RichValueMapper <? super
>> V, ?
>>>>>>> extends VR> mapper);
>>>>>>>
>>>>>>> -------------------------------
>>>>>>>
>>>>>>> The caveat is that it will introduces more overloads; but I think
>> the
>>>>>>> #.overloads are mainly introduced by 1) serde overrides and 2)
>>>>>>> state-store-supplier overides, both of which can be reduced in the
>>> near
>>>>>>> future, and I felt this overloading is still worthwhile, as it has
>> the
>>>>>>> following benefits:
>>>>>>>
>>>>>>> 1) still allow lambda expressions.
>>>>>>> 2) clearer code path (do not need to "convert" from non-rich
>> functions
>>>>> to
>>>>>>> rich functions)
>>>>>>>
>>>>>>>
>>>>>>> Maybe this approach has already been discussed and I may have
>>>>> overlooked
>>>>>> in
>>>>>>> the email thread; anyways, lmk.
>>>>>>>
>>>>>>>
>>>>>>> Guozhang
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Jun 1, 2017 at 10:18 PM, Matthias J. Sax <
>>>>> matth...@confluent.io>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> I agree with Jeyhun. As already mention, the overall API
>> improvement
>>>>>>>> ideas are overlapping and/or contradicting each other. For this
>>>>> reason,
>>>>>>>> not all ideas can be accomplished and some Jira might just be
>> closed
>>>>> as
>>>>>>>> "won't fix".
>>>>>>>>
>>>>>>>> For this reason, we try to do those KIP discussion with are large
>>>>> scope
>>>>>>>> to get an overall picture to converge to an overall consisted API.
>>>>>>>>
>>>>>>>>
>>>>>>>> @Jeyhun: about the overloads. Yes, we might get more overload. It
>>>>> might
>>>>>>>> be sufficient though, to do a single xxxWithContext() overload that
>>>>> will
>>>>>>>> provide key+value+context. Otherwise, if might get too messy having
>>>>>>>> ValueMapper, ValueMapperWithKey, ValueMapperWithContext,
>>>>>>>> ValueMapperWithKeyWithContext.
>>>>>>>>
>>>>>>>> On the other hand, we also have the "builder pattern" idea as an
>> API
>>>>>>>> change and this might mitigate the overload problem. Not for simple
>>>>>>>> function like map/flatMap etc but for joins and aggregations.
>>>>>>>>
>>>>>>>>
>>>>>>>> On the other hand, as I mentioned in an older email, I am
>> personally
>>>>>>>> fine to break the pure functional interface, and add
>>>>>>>>
>>>>>>>>   - interface WithRecordContext with method `open(RecordContext)`
>> (or
>>>>>>>> `init(...)`, or any better name) -- but not `close()`)
>>>>>>>>
>>>>>>>>   - interface ValueMapperWithRecordContext extends ValueMapper,
>>>>>>>> WithRecordContext
>>>>>>>>
>>>>>>>> This would allow us to avoid any overload. Of course, we don't get
>> a
>>>>>>>> "pure function" interface and also sacrifices Lambdas.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> I am personally a little bit undecided what the better option might
>>>>> be.
>>>>>>>> Curious to hear what other think about this trade off.
>>>>>>>>
>>>>>>>>
>>>>>>>> -Matthias
>>>>>>>>
>>>>>>>>
>>>>>>>> On 6/1/17 6:13 PM, Jeyhun Karimov wrote:
>>>>>>>>> Hi Guozhang,
>>>>>>>>>
>>>>>>>>> It subsumes partially. Initially the idea was to support
>>>>> RichFunctions
>>>>>>>> as a
>>>>>>>>> separate interface. Throughout the discussion, however, we
>>> considered
>>>>>>>> maybe
>>>>>>>>> overloading the related methods (with RecodContext param) is
>> better
>>>>>>>>> approach than providing a separate RichFunction interface.
>>>>>>>>>
>>>>>>>>> Cheers,
>>>>>>>>> Jeyhun
>>>>>>>>>
>>>>>>>>> On Fri, Jun 2, 2017 at 2:27 AM Guozhang Wang <wangg...@gmail.com>
>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Does this KIP subsume this ticket as well?
>>>>>>>>>> https://issues.apache.org/jira/browse/KAFKA-4125
>>>>>>>>>>
>>>>>>>>>> On Sat, May 20, 2017 at 9:05 AM, Jeyhun Karimov <
>>>>> je.kari...@gmail.com
>>>>>>>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Dear community,
>>>>>>>>>>>
>>>>>>>>>>> As we discussed in KIP-149 [DISCUSS] thread [1], I would like to
>>>>>>>> initiate
>>>>>>>>>>> KIP for rich functions (interfaces) [2].
>>>>>>>>>>> I would like to get your comments.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> [1]
>>>>>>>>>>> http://search-hadoop.com/m/Kafka/uyzND1PMjdk2CslH12?subj=
>>>>>>>>>>> Re+DISCUSS+KIP+149+Enabling+key+access+in+
>>>>>>>> ValueTransformer+ValueMapper+
>>>>>>>>>>> and+ValueJoiner
>>>>>>>>>>> [2]
>>>>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>>>>>>>>>>> 159%3A+Introducing+Rich+functions+to+Streams
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Cheers,
>>>>>>>>>>> Jeyhun
>>>>>>>>>>> --
>>>>>>>>>>> -Cheers
>>>>>>>>>>>
>>>>>>>>>>> Jeyhun
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> -- Guozhang
>>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> --
>>>>> -Cheers
>>>>>
>>>>> Jeyhun
>>>>>
>>>>
>>>>
>>>>
>>>
>>> --
>> -Cheers
>>
>> Jeyhun
>>
> 
> 
> 

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to