Re: [dev] OCF Native Cloud 2.0

Ondrej Tomcik Tue, 14 Aug 2018 01:35:44 -0700

________________________________
From: Scott King [scott.k...@fkabrands.com]
Sent: 13 August 2018 21:29
To: Tomcik Ondrej; Max Kholmyansky
Cc: iotivity-dev@lists.iotivity.org
Subject: RE: [dev] OCF Native Cloud 2.0


Still trying to cater to those reading this on mobile (like me 60% of the time):


·         Docs stuff

o   I think we’ll need to agree to disagree on this. Tell me if you think I’m 
wrong, but I feel like making an easy “hello world” guide is an important part 
of the developer onboarding experience. If the barrier to entry is too high, or 
if the docs page is too dense/daunting, you’ll scare away devs and systems 
integrators. Maybe we agree on that point, but disagree on where that 
introductory/high level overview should be.

Of course we need an easy "hello world" guide, sample code and so on. 
Completely agree. But this documentation is not of this kind. It was not the 
purpose. But the one you mentioned has to come as well.

·         L7 LB

o   Looking forward to it! I was pretty sure redis wasn’t the best option, but 
I didn’t want to put the idea out there without a jumping off point :). My only 
concern is about maintaining high consistency, but that concern is probably 
unfounded.

·         Regarding EventStore

o   Can you help me better understand why “traditional”/battle-tested noSQL 
databases (like Cassandra or various time-series DB’s) weren’t a good fit for 
this use case? I’m not saying you’re wrong, I just don’t 100% understand the 
unique benefits of the EventStore DB.

Let's have a call. Mailing is boring and takes long :)

·         Misc DB question

o   Given the difference in consistency requirements and read/write ratios 
between user info/credentials (ex: userID and mediator/client/device tokens) 
and device shadow, do you intend on breaking that data into 2 different DB 
solutions? For reference, mainflux uses postgres for user info/credentials, and 
then supports multiple noSQL databases for device shadow/historical data (IIRC 
influxdb, mongodb and cassadra are currently supported)

Authorization Service has to be provided by the user. What means, it will be 
for sure in a different DB. ResourceShadow will have only in-memory data. Data 
will be loaded from EventStore, or based on the documentation, from IRepository.

o   I’m not saying we should try to 100% copy mainflux, but we should evaluate 
their design decisions as part of our design process.

Agree.


·         Per your comment “. If through the sidecar proxy, or directly prepare 
the code of components and user will modify the codebase IF technology 
(kafka,NATS,...) is not supported.”

o   I don’t 100% understand what you’re saying. My goal of the sidecar proxy 
was to not only to make the messaging transparent, but also make the DB and 
auth transparent (which means you can do integration testing with devices using 
the ocf interface container with a mocked backend which would hopefully 
accelerate development). That lets devs experiment with novel back ends; like 
using OPA for high granularity auth, using a blockchain for storing user 
info/credentials or using proprietary managed services (ex: AWS kinesis, GCP 
pubsub, cloud SQL, etc) as part of the backend to reduce operational overhead.

§  Remember, the use cases between consumer electronics, industrial automation, 
blockchain and other domains are likely very different. For example, blockchain 
use cases likely tolerate much higher latencies. Industrial automation likely 
requires the highest performance solution because you rely on both up-to-date 
device data and are doing real-time processing on that data. In comparison, 
consumer electronics often generate very low value data and don’t always 
require device observation (for some but not all of our products, many users 
will never/infrequently check device status in their app, it doesn’t make sense 
to pay the price to cache the device shadow if the data is seldom used.) but 
still requires lower latency for routing requests to the device

§  Another example is healthcare: they likely require tight integrations into 
their EMR system for both auth and DB functionality.

Let's discuss it through the webex.

·         Regarding channels per device comment

o   I believe that NATS supports clustering to allow arbitrarily high numbers 
of channels. I agree that this is probably not the best architectural option if 
you want to also support more traditional pubsub solutions.  Sure it handle 
multiple channels / topics. Kafka as well. But this does not scale. You can end 
up with milions of topics, as you will have them per device. We are currently 
streaming 200k messages per second from one device, with default configuration 
and without scaling. Important is to find balance, number of channels / 
throughput. Throughput can be scaled with partitioning.  Can you help me 
understand how a given interface container isn’t going to be overloaded with 
events that aren’t relevant to that container? If I want to send a message to 
my device, do I publish the message to a work queue in order to ensure that 
only one interface container receives the message? Yes

o   I’m pretty sure that as long as the ocf interface containers are 
stateless/12FA-friendly (https://12factor.net/) that my concerns about this are 
totally eliminated.

·         Regarding observe comments

o   I’ll accept 80%+ of the blame for this misunderstanding. I forgot that the 
device gets to choose which resources to publish. I agree it makes sense that 
you should either observe all published resources or none. I disagree that you 
should mandate observation of published resources. There are low QoS use cases 
that would rather save on messaging bandwidth than maintain perfectly up to 
date representation of device state. I would be ok with observing being the 
default behavior, as long as there’s a mechanism for me to “disable” that 
feature for cost savings purposes.

What exactly you will save? You still have opened the TCP connection to the 
device. This is expensive. Not a message from the device to the cloud.

o   To be clear: I am very happy that you intend on implementing device shadow 
functionality, and I look forward to using that feature in future products, I 
just want it to be optional due to costing and not wanting to force users to 
use features that aren’t in the OCF spec.

Costing / Burden / Overhead s not in game in my opinion. OBSERVE will not bring 
overhead. Regarding OCF Specification and resource shadow, it's in progress. I 
hope it will be in the specification.

But, of course, I will think also about making it optional.

From: Ondrej Tomcik [mailto:ondrej.tom...@kistler.com]
Sent: Thursday, August 9, 2018 5:05 PM
To: Scott King <scott.k...@fkabrands.com>; Max Kholmyansky 
<m...@sureuniversal.com>
Cc: iotivity-dev@lists.iotivity.org
Subject: RE: [dev] OCF Native Cloud 2.0


________________________________
From: Scott King [scott.k...@fkabrands.com]
Sent: 09 August 2018 22:42
To: Tomcik Ondrej; Max Kholmyansky
Cc: iotivity-dev@lists.iotivity.org<mailto:iotivity-dev@lists.iotivity.org>; 
Gregg Reynolds (d...@mobileink.com<mailto:d...@mobileink.com>); Kralik Jozef; 
Rafaj Peter; JinHyeock Choi (jinc...@gmail.com<mailto:jinc...@gmail.com>)
Subject: RE: [dev] OCF Native Cloud 2.0
I have a tough time reading inline comments. I hope this is acceptable format.


•         This will be part of implementation. Published document is not 
limiting you in this area, but does not describe how to achieve it. It’s 
implementation “detail”.

o   If you want multiple backend implementations for a given interface like the 
OCF cloud, then you need to make things very easy and simple. I would assert 
that any implementation details “behind” the interface (like CQRS architecture) 
should be kept in the github repo. The wiki shouldn’t be targeting devs who are 
working on your codebase, it should be targeting devs who want to use your 
codebase in production.

Come on :) Wiki can be place both for developers and for users. Important is 
how you will organize it there, so everybody will find what he is searching for.

•         L7 load balancing

o   If you want to add coap functionality to a popular LB like nginx or envoy 
(preferably envoy because of CNCF membership and no “enterprise” tier) then we 
should discuss that. It would be a great contribution to the ecosystem. I don’t 
see why you couldn’t implement L7 routing as long as the LB maintained the long 
lived connection instead of the OCF interface (you’d need to persist state of 
the device, like being logged in, somewhere though. Maybe a redis db?)

L7 might be next step after working OCF Native Cloud. We can discuss it. Redis 
is not needed, state of the device is already persisted in the eventstore.

•         ES/gRPC

o   Golang can use a gRPC API in a non-blocking manner via goroutines. I think 
you have a good point, but just didn’t explain it well :)

Sure, but that was not the only reason. :) I will try to explain it in second 
document - covering tech stack used for implementation.

o   My desire for gRPC was for communication with a “sidecar proxy” (ie: 
official OCF interface communicates only with devices/LB and a sidecar proxy 
which communicates with pubsub, db, etc)

•  You can keep using pubsub for many things, but you’re abstracting away all 
“non-standard” implementation details (ex: GCP pubsub vs kafka vs NATS)

•  I think we are agreeing when you say “only use gRPC for commands”. But I 
think we disagree on which commands you use it with :)

Depends where you want to do this abstraction. If through the sidecar proxy, or 
directly prepare the code of components and user will modify the codebase IF 
technology (kafka,NATS,...) is not supported. In my opinion, it would be 
overkill to use sidecar for making it messaging technology transparent. Let's 
see, let's discuss it on the slack.

o   If you use 1 channel per event type, that is different that Mainflux which 
uses ~1 NATS channel per device. Does this mean that services will receive many 
“irrelevant” events since they receive events for all devices? Can that scale 
to millions of devices?

Question is, can you scale channel per device to milions of devices? It's best 
practice to have event type per channel / topic. And it's not a good idea to 
have topic per entity, like user, device, ... But of course we have to consider 
everything. Also, implementation "detail", out of scope of current doc.

•         I proposed redis as an alternative to relying on the message queue 
for persistence. This allows more implementation flexibility (my goal is to 
make an implementation that uses as many CNCF projects as possible). I  am not 
100% confident in that proposal, I look forward to your response.

Message queue is not a persistence. The Kafka can't be used for event sourcing, 
nor the NATS Streaming. These are not event stores.

In general, there are two options. Delegate transaction defined in the 
IRepository to 3rd party component - for example 
EventStore(https://eventstore.org/) or handle this transaction in our code - 
what is making things more complicated. Of course it looks easy, but it has 
many bottlenecks. We're now evaluating possible options in this area.

•         I disagree with the decision to automatically observe every resource. 
For my (consumer electronics) use case, there are many times that I want to 
observe a resource, but I don’t often want to observe EVERY resource. I am 100% 
in agreement that it should be easy/standard to be able to observe resources, 
but that should be a later step after initial device provisioning (ex: have 
your client send an observe request to the device via the cloud after the 
device has been provisioned and signed in. The device will see this as the 
cloud sending the observe request and respond accordingly. There’s still 
details that would need to be hashed out, but I want to get your feedback on 
this comment.

It's the core requirement to observe everything. Otherwise you can't provide 
up-to-date resource shadow, what leads to - forward every GET to the device. 
And this does not make sense.

From: Ondrej Tomcik [mailto:ondrej.tom...@kistler.com]
Sent: Thursday, August 9, 2018 12:06 PM
To: Scott King <scott.k...@fkabrands.com<mailto:scott.k...@fkabrands.com>>; Max 
Kholmyansky <m...@sureuniversal.com<mailto:m...@sureuniversal.com>>
Cc: iotivity-dev@lists.iotivity.org<mailto:iotivity-dev@lists.iotivity.org>; 
Gregg Reynolds (d...@mobileink.com<mailto:d...@mobileink.com>) 
<d...@mobileink.com<mailto:d...@mobileink.com>>; Jozef Kralik 
<jozef.kra...@kistler.com<mailto:jozef.kra...@kistler.com>>; Peter Rafaj 
<peter.ra...@kistler.com<mailto:peter.ra...@kistler.com>>; JinHyeock Choi 
(jinc...@gmail.com<mailto:jinc...@gmail.com>) 
<jinc...@gmail.com<mailto:jinc...@gmail.com>>
Subject: RE: [dev] OCF Native Cloud 2.0

Hello Scott!

Ondrej Tomcik :: KISTLER :: measure, analyze, inovate

From: Scott King [mailto:scott.k...@fkabrands.com]
Sent: Thursday, August 9, 2018 4:40 PM
To: Tomcik Ondrej; Max Kholmyansky
Cc: iotivity-dev@lists.iotivity.org<mailto:iotivity-dev@lists.iotivity.org>; 
Gregg Reynolds (d...@mobileink.com<mailto:d...@mobileink.com>); Kralik Jozef; 
Rafaj Peter; JinHyeock Choi (jinc...@gmail.com<mailto:jinc...@gmail.com>)
Subject: RE: [dev] OCF Native Cloud 2.0

Ondrej,

First off, congrats on publishing such an extensive document!


•         Maybe I’m not looking in the right place, but I’m not seeing much 
explanation for how this architecture optimizes for making it easy to integrate 
OCF cloud messaging into existing infrastructure/architecture (especially for 
amazon/google/IBM/azure to offer it as part of their current IoT managed 
services).

This will be part of implementation. Published document is not limiting you in 
this area, but does not describe how to achieve it. It’s implementation 
“detail”.



•         You state that L7 load balancing is an option for CoAP. It was my 
understanding that no load balancers support L7 load balancing with CoAP. Don’t 
you also need to stick to L4 because the OCF device relies on a long-lived 
connection? I could be wrong, so let me know.

Good point. I didn’t investigate If L7 load balancing for the CoAP exists. I 
mentioned it because it is an option, as the CoAP is very similar to the HTTP 
and it can be implemented.

And regarding long lived tcp connections, I am not sure. Why you couldn’t have 
open TCP connection to the L7 load balander, and distribute requests to other 
components based on CoAP data? I might be missing something.



•         I’m concerned that ES/pubsub aren’t preferable over point-to-point 
HTTP/gRPC communication for some of the use cases in your diagrams. For 
example, if the device is trying to sign in to a coap gateway, shouldn’t the 
auth service give a response to the OCF gateway’s token validation request 
rather than publishing an event itself? Can you help me better understand who 
else needs to be immediately notified of a successful login other than the 
gateway?

                EventSourcing and gRPC does not fit together. CQRS and gRPC 
yes. Where you have events, you have the Event Bus. For example Kafka + 
protobuf. Where you have  commands, gRPC might be a solution, or again EventBus 
used as a Command Queue. The response for the sign-in is in form of an event 
just because of non-blocking communication. Whole communication in the OCF 
Native Cloud is non-blocking. So the OCF CoAP Gateway will issue a command to 
the AuthorizeService to verify sign-in token and not wait for the response. It 
may take some time, it may introduce delay in whole system, block the gateway. 
Therefore, the OCF CoAP Gateway is listening on the events (SignedIn), map it 
with the issued request and reply to the device. It’s also scalable, you can 
have scaled AuthorizationService and issue SignIn command to the CommandQueue. 
Mostly available AuthorizationService will take it from the queue, process and 
raise an event that it was processed. So, it’s not about “who else needs to be 
immediately notified” but about non-blocking communication and scalability.

•         How many pubsub channels are required per device in order to 
implement your architecture?

•         I didn’t defined yet organization of channels, but usually, channel 
per event type.

•         Would we benefit from an in-memory DB like redis to handle persisting 
device shadow and device presence/login status?

•         You don’t need redis at all. Resource shadow is stored as a series of 
ResourceRepresentationUpdated events in the event store. When the 
ResourceShadow service is loaded, it will just load these events for every 
resource and subscribe to this event. So the resource shadow is updated 
immediately when such an event occurs. You can restart It or scale it, It will 
again load everything + subscribe. In-memory db is enough.

•         Given the importance of alexa/google assistant functionality for 
commercial adoption, I would hope that we can work together to ensure workflow 
compatibility and develop examples for this feature

Sure

•         Can you confirm that you plan to automatically observe all resources 
that get published to the cloud?

Confirmed

I feel like we need to make a stronger distinction between the minimum feature 
set to satisfy the OCF spec and the additional features that we all want that’s 
out of spec, like device shadow. Can you confirm whether this architectural 
proposal means that you aren’t interested in the gRPC API that I proposed?
Proposed protobuf spec can be used. But just for commands.

Regards,
Scott


From: Ondrej Tomcik [mailto:ondrej.tom...@kistler.com]
Sent: Thursday, August 9, 2018 9:38 AM
To: Max Kholmyansky <m...@sureuniversal.com<mailto:m...@sureuniversal.com>>
Cc: iotivity-dev@lists.iotivity.org<mailto:iotivity-dev@lists.iotivity.org>; 
Scott King <scott.k...@fkabrands.com<mailto:scott.k...@fkabrands.com>>; Gregg 
Reynolds (d...@mobileink.com<mailto:d...@mobileink.com>) 
<d...@mobileink.com<mailto:d...@mobileink.com>>; Jozef Kralik 
<jozef.kra...@kistler.com<mailto:jozef.kra...@kistler.com>>; Peter Rafaj 
<peter.ra...@kistler.com<mailto:peter.ra...@kistler.com>>; JinHyeock Choi 
(jinc...@gmail.com<mailto:jinc...@gmail.com>) 
<jinc...@gmail.com<mailto:jinc...@gmail.com>>
Subject: RE: [dev] OCF Native Cloud 2.0

Inline :)

Ondrej Tomcik :: KISTLER :: measure, analyze, inovate

From: Max Kholmyansky [mailto:m...@sureuniversal.com]
Sent: Thursday, August 9, 2018 3:31 PM
To: Tomcik Ondrej
Cc: iotivity-dev@lists.iotivity.org<mailto:iotivity-dev@lists.iotivity.org>; 
Scott King <scott.k...@fkabrands.com<mailto:scott.k...@fkabrands.com>> 
(scott.k...@fkabrands.com<mailto:scott.k...@fkabrands.com>); Gregg Reynolds 
(d...@mobileink.com<mailto:d...@mobileink.com>); Kralik Jozef; Rafaj Peter; 
JinHyeock Choi (jinc...@gmail.com<mailto:jinc...@gmail.com>)
Subject: Re: [dev] OCF Native Cloud 2.0

Thanks, Ondrej.

Just to clarify what I meant by the "server state".
My question was not about the connectivity, but rather the actual state of the 
resources.
Say, the "OCF Server" is a Light device.
To know if the light is ON - I can query via GET.

I see

But I may also need to:
1. React on the server side on the change of the state (light ON / OFF) - 
without having an OCF client connected.
2. Keep the history of the state changes (for analytics or whatever)

Each change which occurs on the OCF Device side 
(ResourceChanged<https://wiki.iotivity.org/_detail/rb_2.png?id=coapnativecloud>)
 is propagated to the Resource Aggregate (ResourceService). Resource Aggreagete 
will raise an event that resource was changed and store it to the event-store. 
That means that you have whole history what was changed during the time the 
device was online. ResourceShadow is listening on these 
events(ResourceRepresentationUpdated events) and building ResourceShadow 
viewmodel. You, if you are interested in this event, can of course subscribe as 
well and react to every ResourceRepresenationUpdated event. It’s the 
eventbus(kafka,rabbitmq,…) where every event is published to and whoever 
(internal component) can subscribe. OR OCF Client can subscribe through the GW, 
which is also from that moment listening on that specific topic.
Does it make sense?

The question is how I can solve those requirements.

Is there a productized interface to receive cross-account notifications on the 
resource state changes?



Regards
Max.

On Thu, Aug 9, 2018 at 4:15 PM, Ondrej Tomcik 
<ondrej.tom...@kistler.com<mailto:ondrej.tom...@kistler.com>> wrote:
Hello Max,
Thanks for your message.

Please see my inline comments.



Ondrej Tomcik :: KISTLER :: measure, analyze, inovate

From: Max Kholmyansky 
[mailto:m...@sureuniversal.com<mailto:m...@sureuniversal.com>]
Sent: Thursday, August 9, 2018 2:58 PM
To: Tomcik Ondrej
Cc: iotivity-dev@lists.iotivity.org<mailto:iotivity-dev@lists.iotivity.org>; 
Scott King <scott.k...@fkabrands.com<mailto:scott.k...@fkabrands.com>> 
(scott.k...@fkabrands.com<mailto:scott.k...@fkabrands.com>); Max Kholmyansky 
(m...@sureuniversal.com<mailto:m...@sureuniversal.com>); Gregg Reynolds 
(d...@mobileink.com<mailto:d...@mobileink.com>); Kralik Jozef; Rafaj Peter
Subject: Re: [dev] OCF Native Cloud 2.0

Hi Ondrej,

Thanks for sharing the design.

It seems like the design document is technology agnostic: it does not mention 
any specific technology used for the implementation. Yet you mention that the 
implementation is in progress. Does it mean that the technology stack was 
already chosen? Can you share this information?
Yes, this document is still technology agnostic. Soon we will introduce 
selected technology stack. Or let’s say roadmap for supported technologies.
Implementation is in the golang, but technologies like message broker / db / 
event store are being evaluated. But the goal is to not force users to use 
certain db or broker. It should be generic and user should be able to use what 
he prefers. Or use cloud native service.


I have 2 areas in the document I would like to understand better.

1. OCF CoAP Gateway

If my understanding is right, this component is in charge of handling the TCP 
connectivity with the connecting clients and servers, while all the logic is 
"forwarded" to other components, using commands and events. Is it right?
Yes. This allows you to introduce a new gateway, for example HTTP one, and 
guarantee interoperability within the Cloud across multiple different devices.

It will be helpful to get an overall picture of the "other" components.
Other components, or let’s talk about implementation: ResourceService, 
AuthorizationService (sample will be provided but should be user specific), 
ResourceShadowService and ResourceDirectoryService (these two might be just one 
service).


You mention that the "Gateway" is stateful by nature, due to the TCP 
connection. What about the other components? Can they be stateless, so the 
state will be managed in a Data Store? This may be helpful from the scaling 
perspective.
ResourceService is stateless, might be probably deployed also as lambda 
function (evaluating). AuthorizationService is user specific, ResourceShadow 
and ResourceDirectory are the read side, they might use just in-mem db, during 
start filled from event-store.

2. Resource Shadow

If I got it right, the architecture assumes that the cloud keeps the up-to-date 
state of the server resources, by permanently observing those resources, even 
if no client is connected. Is it right?
I assume that by client you meant OCF Client. Yes, you’re right.

Does it mean that a "query" (GET request) by a client can be answered by the 
cloud, without need to query the actual server?
Yes

Will there be a mechanism to sore the history of the server state? What will be 
needed to develop such a functionality?
You mean online / offline? It will be stored, complete history is stored. Each 
Gateway, in this implementation OCF CoAP Gateway has to issue the command to 
ResourceAggregate (ResourcesService) to set the device online / offline. As it 
is aggregate, you have whole history what has happened. Each change to resource 
is persisted. Including device status – online/offline.



The last point... If I got it right, the only way to communicate is via TCP 
connection using TLS. This may be good enough for servers like smart home 
appliances, and clients like mobile apps on smartphones. But there is also a 
case of cloud-to-cloud integration: say, voice commands to be issues by a 3rd 
party cloud. In the cloud-to-cloud case, I doubt it's a good idea to require 
the overhead of a TCP connection per requesting user. Is there any solution for 
cloud to cloud scenario in the current design?
Of course, cloud to cloud, or let’s say you have cloud deployment, where one 
component is the OCF Native Cloud and another one is your set of product 
services. You are not communicating with the OCF Native Cloud through the CoAP 
over TCP. You’re issuing directly GRPC requests and including the oauth token. 
Please check sample usage : 
https://wiki.iotivity.org/coapnativecloud#sample_usage




Best regards

Max.




--
Max Kholmyansky
Software Architect - SURE Universal Ltd.
http://www.sureuniversal.com<http://www.sureuniversal.com/>









On Thu, Aug 9, 2018 at 2:48 PM, Ondrej Tomcik 
<ondrej.tom...@kistler.com<mailto:ondrej.tom...@kistler.com>> wrote:
Dear IoTivity devs,

Please be informed that the new Cloud 2.0 design concept is alive: 
https://wiki.iotivity.org/coapnativecloud
Your comments are warmly welcome.
Implementation is in progress.

BR

Ondrej Tomcik :: KISTLER :: measure, analyze, inovate




--
Max Kholmyansky
Software Architect - SURE Universal Ltd.
http://www.sureuniversal.com




--
Max Kholmyansky
Software Architect - SURE Universal Ltd.
http://www.sureuniversal.com

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#9860): 
https://lists.iotivity.org/g/iotivity-dev/message/9860
Mute This Topic: https://lists.iotivity.org/mt/24238274/21656
Group Owner: iotivity-dev+ow...@lists.iotivity.org
Unsubscribe: https://lists.iotivity.org/g/iotivity-dev/unsub  
[arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Re: [dev] OCF Native Cloud 2.0

Reply via email to