[google-appengine] Re: Synchronizing between different Cloud Sql tables with GooglePubSub or TaskQueue

'Alexey' via Google App Engine Tue, 30 May 2017 11:35:21 -0700

Roxana,

It may be useful to look closer at the nature of your data.  You outlined 2 
basic strategies:

1. Replicate the data between 2 services, having one replica of that data 
be the written to and you would presumably consider it source of truth.  
You have 2 variations on this theme, where you either would rely on a 
database replication or use some messaging system to do the job.
2. Allow one service to call the other service for the data it needs.  Your 
suggestion to cache the data is good, but I would propose that you cache 
the data on the side that's service HTTP responses.  Let the dependent 
service always call the source service.  This will greatly simplify your 
cache management strategy and you'd be able to create other similar 
consumers for this system without maintaining different cache policies.

Each of these 2 ideas is workable and each has its trade-offs.  I would say 
having a single replica of the data is arguably simpler and allows you to 
totally bypass the question of DB syncing with the downside of a 
service-to-service dependency.  I would say option 2 is an easier first 
step and option 1 is a good thing to do if you find you have a performance 
problem and you want to make the data queries with their required joins 
faster.

Depending on the nature of your data and your queries, there may be a third 
option for you to consider.  Something in between these 2 strategies.  
Conflict-free Replicated Data Types 
(https://en.wikipedia.org/wiki/Conflict-free_replicated_data_type) provide 
some new options in situations, where it becomes possible to pass messages 
across a distributed system in order to keep data eventually consistent 
without relying on message order being preserved.  It requires special care 
in designing your data model, but can be a good alternative to 
service-to-service calls and tight DB coupling.  Hope this helps,

Alexey

On Tuesday, May 30, 2017 at 9:44:38 AM UTC-4, Roxana Ioana Roman wrote:
>
> Moving to a micro-service architecture, I have separated an AppEngine 
> module into two services (the big one and a smaller one, still part of the 
> same AppEngine project).
>   
> Next, I want to separate the database into two, so allowing the small 
> micro-service to have only the needed tables in its own db.  The problem is 
> one of the tables is needed by both micro-services. The smaller service 
> only reads data from this table (has multiple select queries which join on 
> data from this table)
>
> Option 1:
> I can leave this table in the bigger service which uses it the most and 
> make the smaller service make an http request to get the data that it 
> needs, cache this data and be notified when this data changes by the owner 
> service of that table. Then, to refresh its cache, the smaller service 
> makes an http request again.
>
> In order to communicate from service A to service B, I was thinking 
> between using Google PubSub or Task Queue. And I am not sure which one to 
> use here. In this case, receiving the message in order is not important 
> since the message will be generic "table_state_changed, query for new 
> data", so both could be used..
>
> Option 2:
> I can duplicate the table in both databases (this will allow the service 
> to have all the necessary data closer).  When the bigger service modifies 
> data in the table, it will notify the smaller service to perform the same 
> modification on its version of the table. In this case, the order of the 
> messages is important, since it specifies exact crud operations to perform 
> on the table. 
>
> Can PubSub be modified to maintain order, somehow insert ordering 
> information in the message payload to contain previous messages sent? 
> Retries are also important, we don't want to end up having inconsistencies 
> between the two tables.
>
> Option 3:
> Is there a way to create Read Replicas with ignored tables in AppEngine 
> (so that I include only the tables I need in one of the services and leave 
> the other one with the entire db as it is currently) and set a specific 
> service to only use that replica? This does not sound as a good idea, 
> however it leaves CloudSql the burden of maintaining same data in both 
> versions of the table).
>
> Which option is better in your opinion and most importantly which one is 
> better suited for this case? PubSub or TaskQueues. 
> Also, this is only the first step to separate the monolith, there will be 
> other services in the future, that would probably encounter the same 
> problem.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/google-appengine.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/google-appengine/a1b95591-2983-4d1e-8004-55bc65aef6da%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[google-appengine] Re: Synchronizing between different Cloud Sql tables with GooglePubSub or TaskQueue

Reply via email to