Re: fedora messaging

Michal Novotny Wed, 15 Aug 2018 06:18:57 -0700

Oh yeah, protocol buffers are useful probably in strongly typed languages
with
lots of types to maintain that type information and being able to parse out
the
serialized content based on that type information. But in case we are
transferring
json, we send the type information in the content itself ({}, [], "",
<int>) so we
don't need to know anything else except the content, we just need to decide
how we are going to transform the content into language-native structures.
It is just an additional thing that I've just realized but I am little bit
guessing here.
It might not be relevant. Forgive me if it is not.


On Wed, Aug 15, 2018 at 2:53 PM Michal Novotny <[email protected]> wrote:

> > Anyway, to summarize, I really really want this to be super easy to use
> and just work. I hope we can improve it further and I'd love to hear
> your thoughts. Do you think my problem statements and design goals are
> reasonable? Given those, do you still feel like sending the schema along
> is worthwhile?
>
> I actually no longer think it is worthwhile.
>
> > As a consumer, I can validate the JSON in a message matches the JSON
> schema
> in the same message, but what does that get me? It doesn't seem any
> different (on the consumer side) than just parsing the JSON outright and
> trying to access whatever deserialized object I get.
>
> I completely agree with this.
>
> Let's go through the problems you mentioned:
>
> 1. Make catching accidental schema changes as a publisher easy.
>
> So we can just solve this by registering the scheme with the publisher
> first before any content gets published and based on the scheme, the
> publisher
> instance may check if the content intended to be sent conforms to
> the scheme, which could catch some bugs before the content
> is actually sent. If we require this to be done on publisher side, then
> there is actually no reason to send the schema alongside the content
> because the check has already been done so consumer already knows
> the message is alright when it is received. What should be sent, however,
> is a scheme ID, e.g. just a natural number. The scheme ID may be then
> used to version the scheme, which would be available somewhere publicly
> e.g. in the service docs the same way Github/Gitlab/etc publishes
> structures
> of their webhook messages. It would be basically part of public API of
> a service.
>
> 2. Make catching mis-behaving publishers on the consuming side easy.
>
> By checking against the scheme on the publisher side, this
> shouldn't be necessary. If someone somehow bypasses the
> publisher check, at worst the message won't be parsable,
> depending on how the message is being parsed. If someone
> wants to really make sure the message is what it is supposed
> to be, he/she can integrate the schema published on the service
> site into the parsing logic but I don't think that's necessary
> thing to do (I personally wouldn't do it in my code).
>
> 3. Make changing the schema a painless process for publishers and
>    consumers.
>
> I think, the only way to do this is to send both content types
> simultaneously
> for some time, each message being marked with its scheme ID. It would be
> good if consumer always specified what scheme ID it wants to consume.
> If there is a higher scheme ID available in the message, a warning could
> be printed
> maybe even to syslog even so that consumers get the information. At the
> same time it should
> be communicated on the service site or by other means available. I don't
> think it is possible
> to make it any better than this.
>
> I fail to see what's the point of packaging the schemas.
> If the message content is in json, then after receiving the message,
> I would like to be able to just call json.loads(msg) and work with the
> resulting structure
> as I am used to.
>
> Actually, what I would do in python is that I would make it a munch and
> then work
> with it. Needing to install some additional package and instantiate some
> high-level
> objects just seems clumsy to me in comparison.
>
> In other programming languages, this procedure would be pretty much the
> same,
> I believe as they all probably provide some json implementation.
>
> You mentioned:
>
> > In the current proposal, consumers don't interact with the JSON at all,
> but with a higher-level Python API that gives publishers flexibility
> when altering their on-the-wire format.
>
> Yes, but with the current proposal if I change the on-the-wire API, I need
> to make a new version of the schema, package it and somehow get it to
> consumers and make them use the correct version that correctly parses
> the new on-the-wire format and translates it correctly to what the
> consumers
> are used to consume? That's seems like something very difficult to get
> done.
>
> And also I don't quite see the point. I wouldn't alter the on-the-wire
> format if it is not actually what users work with and if I needed to go
> through all those steps described above.
>
> If I need to alter the on-the-wire format because application logic
> has been somehow changed, then I would like to make the changes
> in the high-level API as well so again there is no gain there except
> more work with packaging new schemas.
>
> My main point here is that trying to package the schemas to provide
> some high-level objects seems to be redundant. I think lots of people would
> just welcome to work something really simple, which is already provided in
> the language standard library.
>
> For python, If I had to install and import just a single messaging library,
> say to what hub, topic, and scheme ID I want to listen and then consume
> the incoming messages immediately as munches, I would be super happy.
>
> Actually, it might be the case the scheme ID is redundant as well and
> it can be just made part of the topic somehow, in which case the producer
> would probably just produce the content twice on a scheme change at
> least for some time. "Deprecated by <topic>" flag on an incoming message
> would be nice then. Of course, the producer would need to register the two
> schemas and mark one of them as deprecated. The framework would then
> send two messages simultaneously for him. This might be even easier
> solution to the problem. The exact publisher (producer) interface would
> need to be thought through.
>
> > The big problem is that right now the majority of messages are not
> formatted in a way that makes sense and really need to be changed to be
> simple, flat structures that contain the information services need and
> nothing they don't. I'd like to get those fixed in a way that doesn't
> require massive coordinated changes in apps.
>
> In Copr, for example, we take this as an opportunity to change our
> format. If the messaging framework will support format deprecation,
> we might go that way as well to avoid sudden change. But we don't
> currently have many (or maybe any) consumers so I am not sure it is
> necessary for us.
>
> I am not familiar with protocol buffers but to me that thing
> seems rather useful, if you want to send the content in a compact
> binary form to save as much space as possible. If we will send content,
> which can be interpreted as json already, then to make some
> higher-level classes and objects on that seems already unnecessary.
>
> I think we could really just take that already existing generic framework
> you were talking about (RabbitMQ?) and just make sure we can
> check the content against message schemas on producer side (which is
> great for catching little bugs) and that we know how a message format can
> get deprecated (e.g. by adding "deprecated_by: <topic>" field into each
> message
> by the messaging framework, which should somehow log warnings on
> consumer side), also the framework could automatically
> transform the messages into some language-native structures:
> in python, the munches would probably be the most sexy ones.
>
> The whole "let's package schemas" thing seems like something
> we would typically do (because we are packagers) but not as something
> that would solve the actual problems you have mentioned. Rather it
> makes them more difficult to deal with if I am correct.
>
> I think what you are doing is good but I think most people will
> welcome less dependencies and simpler language-native structures.
> So if we could make the framework go more into that direction,
> that would be great.
>
> clime
>
> On Tue, Aug 14, 2018 at 10:55 AM Jeremy Cline <[email protected]> wrote:
>
>> On 08/13/2018 10:20 PM, Michal Novotny wrote:
>> > So I got to know on the flock that fedmsg is going to be replaced?
>> >
>> > Anyway, it seems that there is an idea to create schemas for the
>> messages
>> > and distribute them in packages? And those python packages need to be
>> > present on producer as well as consumer?
>> >> JSON schemas
>> >
>> >> Message bodies are JSON objects, that adhere to a schema. Message
>> schemas
>> > live in their own Python package, so they can be installed on the
>> producer
>> > and on the consumer.
>> >
>> > Could we instead just send the message schemas together with the message
>> > content always?
>>
>> I considered this early on, but it seemed to me it didn't solve all the
>> problems I wanted solved. Those problems are:
>>
>> 1. Make catching accidental schema changes as a publisher easy.
>> 2. Make catching mis-behaving publishers on the consuming side easy.
>> 3. Make changing the schema a painless process for publishers and
>>    consumers.
>>
>> Doing this would solve #1, but #2 and #3 are still a problem. As a
>> consumer, I can validate the JSON in a message matches the JSON schema
>> in the same message, but what does that get me? It doesn't seem any
>> different (on the consumer side) than just parsing the JSON outright and
>> trying to access whatever deserialized object I get.
>>
>> In the current proposal, consumers don't interact with the JSON at all,
>> but with a higher-level Python API that gives publishers flexibility
>> when altering their on-the-wire format.
>>
>> >
>> > I would like to be able to parse any message I receive without some
>> > additional packages installed. If I am about to start listening to a new
>> > message type, I don't want to spend time to be looking up what i should
>> > install to make it work. It should just work. Requiring to have some
>> > packages with schemas installed on consumer and having to maintain them
>> by
>> > the producer does not seem that great idea. Mainly because one of the
>> > raising requirements for fedmsg was that it should be made a generic
>> > messaging framework easily usable outside of Fedora Infrastructure. We
>> > should make it easy for anyone outside to be able to listen and
>> understand
>> > our messages so that they can react to them. Needing to have some python
>> > packages installed (how are they going to be distributed PyPI + fedora
>> ?)
>> > seems to be just an unnecessary hassle. So can we send a schema with
>> each
>> > message as documentation and validation of the message itself?
>>
>> You can parse any message you receive without anything beyond a JSON
>> parsing library. You can do that now and you'll be able to do that after
>> the move. The problem with that is the JSON format might change. The
>> schema alone doesn't solve the problem of changing formats, it just
>> clearly documents what the message used to be and what it is now.
>>
>> I'd love for this to just work and I'm up for any suggestions to make it
>> easier, but I do think we need to make sure any solution covers the
>> three problems stated above.
>>
>> Finally, I do not want to create a generic messaging framework. I want
>> something small that makes a generic messaging framework very easy to
>> use for Fedora infrastructure specifically. I'm happy to help develop a
>> generic framework (like Pika) when necessary, but I don't want to be in
>> the business of authoring and maintaining a generic framework.
>>
>> >
>> > a) it will make our life easier
>> >
>> > b) it will allow people outside of Fedora (that e.g. also don't tend to
>> use
>> > python) to consume our messages easily
>> >
>> > c) what if I am doing a ruby app, not python app, do I need then provide
>> > ruby schema as well as python schema? What if a consumer is a ruby app?
>> We
>> > should only need to write a consumer and producer parts in different
>> > languages. The message schemes should not be bound to a particular
>> > language, otherwise we are just adding us more work when somebody wants
>> to
>> > use the messaging system in another language than python.
>>
>> I agree, and that's why I chose json-schema. A different language just
>> needs to wrap the schema in accessor functions. An alternative (and
>> something I wanted to propose longer term after the AMQP->ZMQ
>> transition) is to use something like protocol buffers rather than JSON.
>> The advantage there is a simplified schema format, it generally pushes
>> into a pattern of backwards compatibility (thus reducing the need for
>> a higher level API), and it auto-generates an object wrapper in many
>> languages. You still need to potentially implement wrappers for access
>> if you change the schema in a way that isn't additive, though.
>>
>> You may notice (and it's not an accident) that the recommended
>> implementation of a Message produces an API that is very similar to the
>> one produced by a Python object generated by protocol buffers. This
>> makes it possible to quietly change to protocol buffers without breaking
>> consumers, assuming they're not digging into the JSON. I'm not saying
>> we'll definitely do that, but it is still on the table and a transition
>> _should_ be easy.
>>
>> The big problem is that right now the majority of messages are not
>> formatted in a way that makes sense and really need to be changed to be
>> simple, flat structures that contain the information services need and
>> nothing they don't. I'd like to get those fixed in a way that doesn't
>> require massive coordinated changes in apps.
>>
>> Anyway, to summarize, I really really want this to be super easy to use
>> and just work. I hope we can improve it further and I'd love to hear
>> your thoughts. Do you think my problem statements and design goals are
>> reasonable? Given those, do you still feel like sending the schema along
>> is worthwhile?
>>
>>
>> --
>> Jeremy Cline
>> XMPP: [email protected]
>> IRC:  jcline
>>
>

_______________________________________________
infrastructure mailing list -- [email protected]
To unsubscribe send an email to [email protected]
Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/[email protected]/message/N3IF64YUOZQVIGQVNCZKGIH3O5NJXG32/

Re: fedora messaging

Reply via email to