Thanks for your feedback Enrico.
My answers to your comments below

BR

Christophe

Le mar. 20 sept. 2022 à 14:16, Enrico Olivelli <eolive...@gmail.com> a
écrit :

> Christophe,
> very good initiative!
>
> I support it
> Some comments inline below
>
>
> Enrico
>
> Il giorno lun 19 set 2022 alle ore 19:10 Christophe Bornet
> <bornet.ch...@gmail.com> ha scritto:
> >
> > Hi all,
> >
> > I have drafted PIP-208: HTTP Sink
> >
> > PIP link:
> > https://github.com/apache/pulsar/issues/17719
> >
> > Here's a copy of the contents of the GH issue for your references:
> >
> > ### Motivation
> >
> > Currently, when you want to consume from Pulsar topics in applications
> > written in languages that don't have a Pulsar driver supported, you need
> to
> > run some type of proxy like the WebSocket Proxy or Pulsar Beam. In
> > production this needs additional effort to deploy, scale, load balance,
> > monitor, and so on...
> > Pulsar IO is a framework that deals with all these operational subjects
> and
> > can be leveraged to provide a way to push messages to external systems
> > using HTTP, a protocol supported by every existing language and OS.
> >
> > ### Goal
> >
> > This proposal defines an HTTP Sink that sends the messages to a
> configured
> > URL.
> > It takes inspiration from [Pulsar Beam](
> > https://github.com/kafkaesque-io/pulsar-beam) and the [Confluent HTTP
> Sink
> > connector](
> > https://docs.confluent.io/kafka-connectors/http/current/overview.html).
> >
> >
> > ### Implementation
> >
> > A `pulsar-io-http` module will be added to `pulsar-io`.
> > On building the project `pulsar-io-http-{version}.nar` will be built and
> > added to the `pulsar-all` distribution.
> > The name of the Sink will be `http`.
> >
> > The HTTP Sink pushes records to any HTTP server with the record value in
> > the body of a POST method.
> > The body of the HTTP request is the JSON representation of the record
> value.
>
> What do you mean ?
> I think that this should depend on the Schema.
>
> BYTES SCHEMA -> I would push the raw message payload
> PRIMITIVE VALUES (long, integer, string) - > I would push the JSON
> represantation
> JSON SCHEMA ->  push the raw message payload
> AVRO -> ?  convert to JSON ?
> PROTOBUF -> ? convert to JSON ?
> KEY-VALUE ?
>
> Probably we need some flag to define the behaviour for the non trivial
> cases.
>
> The current impl chooses to serialize as JSON because it's a well
supported content-type on the server frameworks.
It's also to be consistent with existing HTTP Sinks such as Pulsar Bean and
Confluent HTTP Sink Connector.
The possibility to adapt the content-type to the schema is elegant and will
probably result in shorter payloads (but less readable) and I think it
could be done as a follow-up option.
It has indeed the problem of being difficult to do for KV schema.
For the content-type mappings I would do:
BYTES SCHEMA -> application/octet-stream (raw bytes)
PRIMITIVE VALUES (long, integer, string) - > text/plain
JSON ->  application/json
AVRO -> avro/binary
PROTOBUF -> probably application/octet-stream ?
KEY-VALUE ?

Would also need to indicate the Schema-Type in the HTTP headers.


>
> >
> > Some headers are added to the HTTP request:
> > * `PulsarTopic`: the topic of the record
> > * `PulsarKey`: the key of the record
> > * `PulsarEventTime`: the event time of the record
> > * `PulsarPublishTime`: the publish time of the record
> > * `PulsarMessageId`: the ID of the message contained in the record
> > * `PulsarProperties-*`: each record property is passed with the property
> > name prefixed by `PulsarProperties-`
> >
>
> Can we make the "Content-Type" configurable ?
>
Yes we can. But do we do it for the first iteration ?
If we do it, I would have an option to add some fix headers and the user
can override the content-type.
If we go for a variable content-type depending on the schema, then we could
have a map<SchemaType, content-type>

> Can we make the HTTP METHOD configurable ?
>
Yes we can. But do we do it for the first iteration ?

>
> > ### Alternatives
> >
> > Creating a separated project for this Sink is rejected since:
> > * this Sink is very useful for developers to test the Pulsar IO
> framework,
> > transform functions, and to make demos.
> > * the code has a very small footprint with no external dependencies.
> > * it should be visible at the same level as other sinks
>
> 100% agreed !
>
> >
> > I'm looking forward the discussion.
> >
> > Best regards,
> >
> > Christophe Bornet
>

Reply via email to