Hi Fabian thanx!

> 1) Is it a strict requirement that a ML pipeline must be able to handle
> different input types?
> I understand that it makes sense to have different models for different
> instances of the same type, i.e., same data type but different keys. Hence,
> the key-based joins make sense to me. However, couldn't completely
> different types be handled by different ML pipelines or would there be
> major drawbacks?


Could you elaborate more on this? Right now we only use keys when we do the
join. A given pipeline can handle only a well defined type (the type can be
a simple string with a custom value, no need to be a
class type) which serves as a key.

2)

I think from an API point of view it would be better to not require
> input records to be encoded as ProtoBuf messages. Instead, the model server
> could accept strongly-typed objects (Java/Scala) and (if necessary) convert
> them to ProtoBuf messages internally. In case we need to support different
> types of records (see my first point), we can introduce a Union type (i.e.,
> an n-ary Either type). I see that we need some kind of binary encoding
> format for the models but maybe also this can be designed to be pluggable
> such that later other encodings can be added.
>
 We do uses scala classes (strongly typed classes), protobuf is only used
on the wire. For on the wire encoding we prefer protobufs for size,
expressiveness and ability to represent different data types.

3)

I think the DataStream Java API should be supported as a first class
> citizens for this library.


I agree. It should be either first priority or a next thing to do.


4)

For the integration with the DataStream API, we could provide an API that
> receives (typed) DataStream objects, internally constructs the DataStream
> operators, and returns one (or more) result DataStreams. The benefit is
> that we don't need to change the DataStream API directly, but put a library
> on top. The other libraries (CEP, Table, Gelly) follow this approach.


 We will provide a DSL which will do jsut this. But even without the DSL
this is what we do with low level joins.


5)

> I'm skeptical about using queryable state to expose metrics. Did you
> consider using Flink's metrics system [1]? It is easily configurable and we
> provided several reporters that export the metrics.
>
This of course is an option. The choice of queryable state was mostly
driven by a simplicity of real time integration.  Any reason why metrics
system is netter?


Best,
Stavros

On Mon, Nov 27, 2017 at 4:23 PM, Fabian Hueske <fhue...@gmail.com> wrote:

> Hi Stavros,
>
> thanks for the detailed FLIP!
> Model serving is an important use case and it's great to see efforts to add
> a library for this to Flink!
>
> I've read the FLIP and would like to ask a few questions and make some
> suggestions.
>
> 1) Is it a strict requirement that a ML pipeline must be able to handle
> different input types?
> I understand that it makes sense to have different models for different
> instances of the same type, i.e., same data type but different keys. Hence,
> the key-based joins make sense to me. However, couldn't completely
> different types be handled by different ML pipelines or would there be
> major drawbacks?
>
> 2) I think from an API point of view it would be better to not require
> input records to be encoded as ProtoBuf messages. Instead, the model server
> could accept strongly-typed objects (Java/Scala) and (if necessary) convert
> them to ProtoBuf messages internally. In case we need to support different
> types of records (see my first point), we can introduce a Union type (i.e.,
> an n-ary Either type). I see that we need some kind of binary encoding
> format for the models but maybe also this can be designed to be pluggable
> such that later other encodings can be added.
>
> 3) I think the DataStream Java API should be supported as a first class
> citizens for this library.
>
> 4) For the integration with the DataStream API, we could provide an API
> that receives (typed) DataStream objects, internally constructs the
> DataStream operators, and returns one (or more) result DataStreams. The
> benefit is that we don't need to change the DataStream API directly, but
> put a library on top. The other libraries (CEP, Table, Gelly) follow this
> approach.
>
> 5) I'm skeptical about using queryable state to expose metrics. Did you
> consider using Flink's metrics system [1]? It is easily configurable and we
> provided several reporters that export the metrics.
>
> What do you think?
> Best, Fabian
>
> [1]
> https://ci.apache.org/projects/flink/flink-docs-release-1.3/monitoring/
> metrics.html
>
> 2017-11-23 12:32 GMT+01:00 Stavros Kontopoulos <st.kontopou...@gmail.com>:
>
> > Hi guys,
> >
> > Let's discuss the new FLIP proposal for model serving over Flink. The
> idea
> > is to combine previous efforts there and provide a library on top of
> Flink
> > for serving models.
> >
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-
> 23+-+Model+Serving
> >
> > Code from previous efforts can be found here: https://github.com/FlinkML
> >
> > Best,
> > Stavros
> >
>

Reply via email to