________________________________ From: Scott King [scott.k...@fkabrands.com] Sent: 13 August 2018 21:29 To: Tomcik Ondrej; Max Kholmyansky Cc: iotivity-dev@lists.iotivity.org Subject: RE: [dev] OCF Native Cloud 2.0
Still trying to cater to those reading this on mobile (like me 60% of the time): · Docs stuff o I think we’ll need to agree to disagree on this. Tell me if you think I’m wrong, but I feel like making an easy “hello world” guide is an important part of the developer onboarding experience. If the barrier to entry is too high, or if the docs page is too dense/daunting, you’ll scare away devs and systems integrators. Maybe we agree on that point, but disagree on where that introductory/high level overview should be. Of course we need an easy "hello world" guide, sample code and so on. Completely agree. But this documentation is not of this kind. It was not the purpose. But the one you mentioned has to come as well. · L7 LB o Looking forward to it! I was pretty sure redis wasn’t the best option, but I didn’t want to put the idea out there without a jumping off point :). My only concern is about maintaining high consistency, but that concern is probably unfounded. · Regarding EventStore o Can you help me better understand why “traditional”/battle-tested noSQL databases (like Cassandra or various time-series DB’s) weren’t a good fit for this use case? I’m not saying you’re wrong, I just don’t 100% understand the unique benefits of the EventStore DB. Let's have a call. Mailing is boring and takes long :) · Misc DB question o Given the difference in consistency requirements and read/write ratios between user info/credentials (ex: userID and mediator/client/device tokens) and device shadow, do you intend on breaking that data into 2 different DB solutions? For reference, mainflux uses postgres for user info/credentials, and then supports multiple noSQL databases for device shadow/historical data (IIRC influxdb, mongodb and cassadra are currently supported) Authorization Service has to be provided by the user. What means, it will be for sure in a different DB. ResourceShadow will have only in-memory data. Data will be loaded from EventStore, or based on the documentation, from IRepository. o I’m not saying we should try to 100% copy mainflux, but we should evaluate their design decisions as part of our design process. Agree. · Per your comment “. If through the sidecar proxy, or directly prepare the code of components and user will modify the codebase IF technology (kafka,NATS,...) is not supported.” o I don’t 100% understand what you’re saying. My goal of the sidecar proxy was to not only to make the messaging transparent, but also make the DB and auth transparent (which means you can do integration testing with devices using the ocf interface container with a mocked backend which would hopefully accelerate development). That lets devs experiment with novel back ends; like using OPA for high granularity auth, using a blockchain for storing user info/credentials or using proprietary managed services (ex: AWS kinesis, GCP pubsub, cloud SQL, etc) as part of the backend to reduce operational overhead. § Remember, the use cases between consumer electronics, industrial automation, blockchain and other domains are likely very different. For example, blockchain use cases likely tolerate much higher latencies. Industrial automation likely requires the highest performance solution because you rely on both up-to-date device data and are doing real-time processing on that data. In comparison, consumer electronics often generate very low value data and don’t always require device observation (for some but not all of our products, many users will never/infrequently check device status in their app, it doesn’t make sense to pay the price to cache the device shadow if the data is seldom used.) but still requires lower latency for routing requests to the device § Another example is healthcare: they likely require tight integrations into their EMR system for both auth and DB functionality. Let's discuss it through the webex. · Regarding channels per device comment o I believe that NATS supports clustering to allow arbitrarily high numbers of channels. I agree that this is probably not the best architectural option if you want to also support more traditional pubsub solutions. Sure it handle multiple channels / topics. Kafka as well. But this does not scale. You can end up with milions of topics, as you will have them per device. We are currently streaming 200k messages per second from one device, with default configuration and without scaling. Important is to find balance, number of channels / throughput. Throughput can be scaled with partitioning. Can you help me understand how a given interface container isn’t going to be overloaded with events that aren’t relevant to that container? If I want to send a message to my device, do I publish the message to a work queue in order to ensure that only one interface container receives the message? Yes o I’m pretty sure that as long as the ocf interface containers are stateless/12FA-friendly (https://12factor.net/) that my concerns about this are totally eliminated. · Regarding observe comments o I’ll accept 80%+ of the blame for this misunderstanding. I forgot that the device gets to choose which resources to publish. I agree it makes sense that you should either observe all published resources or none. I disagree that you should mandate observation of published resources. There are low QoS use cases that would rather save on messaging bandwidth than maintain perfectly up to date representation of device state. I would be ok with observing being the default behavior, as long as there’s a mechanism for me to “disable” that feature for cost savings purposes. What exactly you will save? You still have opened the TCP connection to the device. This is expensive. Not a message from the device to the cloud. o To be clear: I am very happy that you intend on implementing device shadow functionality, and I look forward to using that feature in future products, I just want it to be optional due to costing and not wanting to force users to use features that aren’t in the OCF spec. Costing / Burden / Overhead s not in game in my opinion. OBSERVE will not bring overhead. Regarding OCF Specification and resource shadow, it's in progress. I hope it will be in the specification. But, of course, I will think also about making it optional. From: Ondrej Tomcik [mailto:ondrej.tom...@kistler.com] Sent: Thursday, August 9, 2018 5:05 PM To: Scott King <scott.k...@fkabrands.com>; Max Kholmyansky <m...@sureuniversal.com> Cc: iotivity-dev@lists.iotivity.org Subject: RE: [dev] OCF Native Cloud 2.0 ________________________________ From: Scott King [scott.k...@fkabrands.com] Sent: 09 August 2018 22:42 To: Tomcik Ondrej; Max Kholmyansky Cc: iotivity-dev@lists.iotivity.org<mailto:iotivity-dev@lists.iotivity.org>; Gregg Reynolds (d...@mobileink.com<mailto:d...@mobileink.com>); Kralik Jozef; Rafaj Peter; JinHyeock Choi (jinc...@gmail.com<mailto:jinc...@gmail.com>) Subject: RE: [dev] OCF Native Cloud 2.0 I have a tough time reading inline comments. I hope this is acceptable format. • This will be part of implementation. Published document is not limiting you in this area, but does not describe how to achieve it. It’s implementation “detail”. o If you want multiple backend implementations for a given interface like the OCF cloud, then you need to make things very easy and simple. I would assert that any implementation details “behind” the interface (like CQRS architecture) should be kept in the github repo. The wiki shouldn’t be targeting devs who are working on your codebase, it should be targeting devs who want to use your codebase in production. Come on :) Wiki can be place both for developers and for users. Important is how you will organize it there, so everybody will find what he is searching for. • L7 load balancing o If you want to add coap functionality to a popular LB like nginx or envoy (preferably envoy because of CNCF membership and no “enterprise” tier) then we should discuss that. It would be a great contribution to the ecosystem. I don’t see why you couldn’t implement L7 routing as long as the LB maintained the long lived connection instead of the OCF interface (you’d need to persist state of the device, like being logged in, somewhere though. Maybe a redis db?) L7 might be next step after working OCF Native Cloud. We can discuss it. Redis is not needed, state of the device is already persisted in the eventstore. • ES/gRPC o Golang can use a gRPC API in a non-blocking manner via goroutines. I think you have a good point, but just didn’t explain it well :) Sure, but that was not the only reason. :) I will try to explain it in second document - covering tech stack used for implementation. o My desire for gRPC was for communication with a “sidecar proxy” (ie: official OCF interface communicates only with devices/LB and a sidecar proxy which communicates with pubsub, db, etc) • You can keep using pubsub for many things, but you’re abstracting away all “non-standard” implementation details (ex: GCP pubsub vs kafka vs NATS) • I think we are agreeing when you say “only use gRPC for commands”. But I think we disagree on which commands you use it with :) Depends where you want to do this abstraction. If through the sidecar proxy, or directly prepare the code of components and user will modify the codebase IF technology (kafka,NATS,...) is not supported. In my opinion, it would be overkill to use sidecar for making it messaging technology transparent. Let's see, let's discuss it on the slack. o If you use 1 channel per event type, that is different that Mainflux which uses ~1 NATS channel per device. Does this mean that services will receive many “irrelevant” events since they receive events for all devices? Can that scale to millions of devices? Question is, can you scale channel per device to milions of devices? It's best practice to have event type per channel / topic. And it's not a good idea to have topic per entity, like user, device, ... But of course we have to consider everything. Also, implementation "detail", out of scope of current doc. • I proposed redis as an alternative to relying on the message queue for persistence. This allows more implementation flexibility (my goal is to make an implementation that uses as many CNCF projects as possible). I am not 100% confident in that proposal, I look forward to your response. Message queue is not a persistence. The Kafka can't be used for event sourcing, nor the NATS Streaming. These are not event stores. In general, there are two options. Delegate transaction defined in the IRepository to 3rd party component - for example EventStore(https://eventstore.org/) or handle this transaction in our code - what is making things more complicated. Of course it looks easy, but it has many bottlenecks. We're now evaluating possible options in this area. • I disagree with the decision to automatically observe every resource. For my (consumer electronics) use case, there are many times that I want to observe a resource, but I don’t often want to observe EVERY resource. I am 100% in agreement that it should be easy/standard to be able to observe resources, but that should be a later step after initial device provisioning (ex: have your client send an observe request to the device via the cloud after the device has been provisioned and signed in. The device will see this as the cloud sending the observe request and respond accordingly. There’s still details that would need to be hashed out, but I want to get your feedback on this comment. It's the core requirement to observe everything. Otherwise you can't provide up-to-date resource shadow, what leads to - forward every GET to the device. And this does not make sense. From: Ondrej Tomcik [mailto:ondrej.tom...@kistler.com] Sent: Thursday, August 9, 2018 12:06 PM To: Scott King <scott.k...@fkabrands.com<mailto:scott.k...@fkabrands.com>>; Max Kholmyansky <m...@sureuniversal.com<mailto:m...@sureuniversal.com>> Cc: iotivity-dev@lists.iotivity.org<mailto:iotivity-dev@lists.iotivity.org>; Gregg Reynolds (d...@mobileink.com<mailto:d...@mobileink.com>) <d...@mobileink.com<mailto:d...@mobileink.com>>; Jozef Kralik <jozef.kra...@kistler.com<mailto:jozef.kra...@kistler.com>>; Peter Rafaj <peter.ra...@kistler.com<mailto:peter.ra...@kistler.com>>; JinHyeock Choi (jinc...@gmail.com<mailto:jinc...@gmail.com>) <jinc...@gmail.com<mailto:jinc...@gmail.com>> Subject: RE: [dev] OCF Native Cloud 2.0 Hello Scott! Ondrej Tomcik :: KISTLER :: measure, analyze, inovate From: Scott King [mailto:scott.k...@fkabrands.com] Sent: Thursday, August 9, 2018 4:40 PM To: Tomcik Ondrej; Max Kholmyansky Cc: iotivity-dev@lists.iotivity.org<mailto:iotivity-dev@lists.iotivity.org>; Gregg Reynolds (d...@mobileink.com<mailto:d...@mobileink.com>); Kralik Jozef; Rafaj Peter; JinHyeock Choi (jinc...@gmail.com<mailto:jinc...@gmail.com>) Subject: RE: [dev] OCF Native Cloud 2.0 Ondrej, First off, congrats on publishing such an extensive document! • Maybe I’m not looking in the right place, but I’m not seeing much explanation for how this architecture optimizes for making it easy to integrate OCF cloud messaging into existing infrastructure/architecture (especially for amazon/google/IBM/azure to offer it as part of their current IoT managed services). This will be part of implementation. Published document is not limiting you in this area, but does not describe how to achieve it. It’s implementation “detail”. • You state that L7 load balancing is an option for CoAP. It was my understanding that no load balancers support L7 load balancing with CoAP. Don’t you also need to stick to L4 because the OCF device relies on a long-lived connection? I could be wrong, so let me know. Good point. I didn’t investigate If L7 load balancing for the CoAP exists. I mentioned it because it is an option, as the CoAP is very similar to the HTTP and it can be implemented. And regarding long lived tcp connections, I am not sure. Why you couldn’t have open TCP connection to the L7 load balander, and distribute requests to other components based on CoAP data? I might be missing something. • I’m concerned that ES/pubsub aren’t preferable over point-to-point HTTP/gRPC communication for some of the use cases in your diagrams. For example, if the device is trying to sign in to a coap gateway, shouldn’t the auth service give a response to the OCF gateway’s token validation request rather than publishing an event itself? Can you help me better understand who else needs to be immediately notified of a successful login other than the gateway? EventSourcing and gRPC does not fit together. CQRS and gRPC yes. Where you have events, you have the Event Bus. For example Kafka + protobuf. Where you have commands, gRPC might be a solution, or again EventBus used as a Command Queue. The response for the sign-in is in form of an event just because of non-blocking communication. Whole communication in the OCF Native Cloud is non-blocking. So the OCF CoAP Gateway will issue a command to the AuthorizeService to verify sign-in token and not wait for the response. It may take some time, it may introduce delay in whole system, block the gateway. Therefore, the OCF CoAP Gateway is listening on the events (SignedIn), map it with the issued request and reply to the device. It’s also scalable, you can have scaled AuthorizationService and issue SignIn command to the CommandQueue. Mostly available AuthorizationService will take it from the queue, process and raise an event that it was processed. So, it’s not about “who else needs to be immediately notified” but about non-blocking communication and scalability. • How many pubsub channels are required per device in order to implement your architecture? • I didn’t defined yet organization of channels, but usually, channel per event type. • Would we benefit from an in-memory DB like redis to handle persisting device shadow and device presence/login status? • You don’t need redis at all. Resource shadow is stored as a series of ResourceRepresentationUpdated events in the event store. When the ResourceShadow service is loaded, it will just load these events for every resource and subscribe to this event. So the resource shadow is updated immediately when such an event occurs. You can restart It or scale it, It will again load everything + subscribe. In-memory db is enough. • Given the importance of alexa/google assistant functionality for commercial adoption, I would hope that we can work together to ensure workflow compatibility and develop examples for this feature Sure • Can you confirm that you plan to automatically observe all resources that get published to the cloud? Confirmed I feel like we need to make a stronger distinction between the minimum feature set to satisfy the OCF spec and the additional features that we all want that’s out of spec, like device shadow. Can you confirm whether this architectural proposal means that you aren’t interested in the gRPC API that I proposed? Proposed protobuf spec can be used. But just for commands. Regards, Scott From: Ondrej Tomcik [mailto:ondrej.tom...@kistler.com] Sent: Thursday, August 9, 2018 9:38 AM To: Max Kholmyansky <m...@sureuniversal.com<mailto:m...@sureuniversal.com>> Cc: iotivity-dev@lists.iotivity.org<mailto:iotivity-dev@lists.iotivity.org>; Scott King <scott.k...@fkabrands.com<mailto:scott.k...@fkabrands.com>>; Gregg Reynolds (d...@mobileink.com<mailto:d...@mobileink.com>) <d...@mobileink.com<mailto:d...@mobileink.com>>; Jozef Kralik <jozef.kra...@kistler.com<mailto:jozef.kra...@kistler.com>>; Peter Rafaj <peter.ra...@kistler.com<mailto:peter.ra...@kistler.com>>; JinHyeock Choi (jinc...@gmail.com<mailto:jinc...@gmail.com>) <jinc...@gmail.com<mailto:jinc...@gmail.com>> Subject: RE: [dev] OCF Native Cloud 2.0 Inline :) Ondrej Tomcik :: KISTLER :: measure, analyze, inovate From: Max Kholmyansky [mailto:m...@sureuniversal.com] Sent: Thursday, August 9, 2018 3:31 PM To: Tomcik Ondrej Cc: iotivity-dev@lists.iotivity.org<mailto:iotivity-dev@lists.iotivity.org>; Scott King <scott.k...@fkabrands.com<mailto:scott.k...@fkabrands.com>> (scott.k...@fkabrands.com<mailto:scott.k...@fkabrands.com>); Gregg Reynolds (d...@mobileink.com<mailto:d...@mobileink.com>); Kralik Jozef; Rafaj Peter; JinHyeock Choi (jinc...@gmail.com<mailto:jinc...@gmail.com>) Subject: Re: [dev] OCF Native Cloud 2.0 Thanks, Ondrej. Just to clarify what I meant by the "server state". My question was not about the connectivity, but rather the actual state of the resources. Say, the "OCF Server" is a Light device. To know if the light is ON - I can query via GET. I see But I may also need to: 1. React on the server side on the change of the state (light ON / OFF) - without having an OCF client connected. 2. Keep the history of the state changes (for analytics or whatever) Each change which occurs on the OCF Device side (ResourceChanged<https://wiki.iotivity.org/_detail/rb_2.png?id=coapnativecloud>) is propagated to the Resource Aggregate (ResourceService). Resource Aggreagete will raise an event that resource was changed and store it to the event-store. That means that you have whole history what was changed during the time the device was online. ResourceShadow is listening on these events(ResourceRepresentationUpdated events) and building ResourceShadow viewmodel. You, if you are interested in this event, can of course subscribe as well and react to every ResourceRepresenationUpdated event. It’s the eventbus(kafka,rabbitmq,…) where every event is published to and whoever (internal component) can subscribe. OR OCF Client can subscribe through the GW, which is also from that moment listening on that specific topic. Does it make sense? The question is how I can solve those requirements. Is there a productized interface to receive cross-account notifications on the resource state changes? Regards Max. On Thu, Aug 9, 2018 at 4:15 PM, Ondrej Tomcik <ondrej.tom...@kistler.com<mailto:ondrej.tom...@kistler.com>> wrote: Hello Max, Thanks for your message. Please see my inline comments. Ondrej Tomcik :: KISTLER :: measure, analyze, inovate From: Max Kholmyansky [mailto:m...@sureuniversal.com<mailto:m...@sureuniversal.com>] Sent: Thursday, August 9, 2018 2:58 PM To: Tomcik Ondrej Cc: iotivity-dev@lists.iotivity.org<mailto:iotivity-dev@lists.iotivity.org>; Scott King <scott.k...@fkabrands.com<mailto:scott.k...@fkabrands.com>> (scott.k...@fkabrands.com<mailto:scott.k...@fkabrands.com>); Max Kholmyansky (m...@sureuniversal.com<mailto:m...@sureuniversal.com>); Gregg Reynolds (d...@mobileink.com<mailto:d...@mobileink.com>); Kralik Jozef; Rafaj Peter Subject: Re: [dev] OCF Native Cloud 2.0 Hi Ondrej, Thanks for sharing the design. It seems like the design document is technology agnostic: it does not mention any specific technology used for the implementation. Yet you mention that the implementation is in progress. Does it mean that the technology stack was already chosen? Can you share this information? Yes, this document is still technology agnostic. Soon we will introduce selected technology stack. Or let’s say roadmap for supported technologies. Implementation is in the golang, but technologies like message broker / db / event store are being evaluated. But the goal is to not force users to use certain db or broker. It should be generic and user should be able to use what he prefers. Or use cloud native service. I have 2 areas in the document I would like to understand better. 1. OCF CoAP Gateway If my understanding is right, this component is in charge of handling the TCP connectivity with the connecting clients and servers, while all the logic is "forwarded" to other components, using commands and events. Is it right? Yes. This allows you to introduce a new gateway, for example HTTP one, and guarantee interoperability within the Cloud across multiple different devices. It will be helpful to get an overall picture of the "other" components. Other components, or let’s talk about implementation: ResourceService, AuthorizationService (sample will be provided but should be user specific), ResourceShadowService and ResourceDirectoryService (these two might be just one service). You mention that the "Gateway" is stateful by nature, due to the TCP connection. What about the other components? Can they be stateless, so the state will be managed in a Data Store? This may be helpful from the scaling perspective. ResourceService is stateless, might be probably deployed also as lambda function (evaluating). AuthorizationService is user specific, ResourceShadow and ResourceDirectory are the read side, they might use just in-mem db, during start filled from event-store. 2. Resource Shadow If I got it right, the architecture assumes that the cloud keeps the up-to-date state of the server resources, by permanently observing those resources, even if no client is connected. Is it right? I assume that by client you meant OCF Client. Yes, you’re right. Does it mean that a "query" (GET request) by a client can be answered by the cloud, without need to query the actual server? Yes Will there be a mechanism to sore the history of the server state? What will be needed to develop such a functionality? You mean online / offline? It will be stored, complete history is stored. Each Gateway, in this implementation OCF CoAP Gateway has to issue the command to ResourceAggregate (ResourcesService) to set the device online / offline. As it is aggregate, you have whole history what has happened. Each change to resource is persisted. Including device status – online/offline. The last point... If I got it right, the only way to communicate is via TCP connection using TLS. This may be good enough for servers like smart home appliances, and clients like mobile apps on smartphones. But there is also a case of cloud-to-cloud integration: say, voice commands to be issues by a 3rd party cloud. In the cloud-to-cloud case, I doubt it's a good idea to require the overhead of a TCP connection per requesting user. Is there any solution for cloud to cloud scenario in the current design? Of course, cloud to cloud, or let’s say you have cloud deployment, where one component is the OCF Native Cloud and another one is your set of product services. You are not communicating with the OCF Native Cloud through the CoAP over TCP. You’re issuing directly GRPC requests and including the oauth token. Please check sample usage : https://wiki.iotivity.org/coapnativecloud#sample_usage Best regards Max. -- Max Kholmyansky Software Architect - SURE Universal Ltd. http://www.sureuniversal.com<http://www.sureuniversal.com/> On Thu, Aug 9, 2018 at 2:48 PM, Ondrej Tomcik <ondrej.tom...@kistler.com<mailto:ondrej.tom...@kistler.com>> wrote: Dear IoTivity devs, Please be informed that the new Cloud 2.0 design concept is alive: https://wiki.iotivity.org/coapnativecloud Your comments are warmly welcome. Implementation is in progress. BR Ondrej Tomcik :: KISTLER :: measure, analyze, inovate -- Max Kholmyansky Software Architect - SURE Universal Ltd. http://www.sureuniversal.com -- Max Kholmyansky Software Architect - SURE Universal Ltd. http://www.sureuniversal.com -=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#9860): https://lists.iotivity.org/g/iotivity-dev/message/9860 Mute This Topic: https://lists.iotivity.org/mt/24238274/21656 Group Owner: iotivity-dev+ow...@lists.iotivity.org Unsubscribe: https://lists.iotivity.org/g/iotivity-dev/unsub [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-