[google-appengine] Re: Synchronizing between different Cloud Sql tables with GooglePubSub or TaskQueue

'Yannick (Cloud Platform Support)' via Google App Engine Wed, 31 May 2017 13:23:30 -0700


Thank you for those clarifications.


 

About using Cloud Pub/Sub with App Engine, there would only be a single 
push subscription endpoint 
<https://cloud.google.com/pubsub/docs/push#app-engine-standard-endpoints> 
per service of your application, not one per instance. That is because 
unless you are using manually scaled instances in the standard environment, 
you cannot define an endpoint that would target a specific instance 
<https://cloud.google.com/appengine/docs/standard/java/how-requests-are-routed#targeted_routing>.
 
This means that sequential Pub/Sub messages sent to a service with several 
instances but one single push endpoint could be received by different 
instances of the same service which would render them unable to properly 
keep track of the full sequence.

 

To alleviate this issue you might want to have a single instance from a 
dedicated service receive and process the messages in sequence to update 
the micro service's database. This should not be an issue unless you intend 
for every single instance to hold a local copy of the data.

 

If you do opt to replicate the database’s state using messages and the 
order of the messages matters then Task Queues would be limited to running 
one task at a time as there is no mechanism for ensuring concurrent tasks 
are executed in a certain order. This being said if you simply intend for 
the message to kick off a synchronization job between two databases then it 
could work just fine.

 

If it’s suitable to the type of data your new service needs to access you 
might also want to simply replicate the data Cloud Datastore 
<https://cloud.google.com/datastore/> from the same source that would 
update Cloud SQL. You could make use of transactions 
<https://cloud.google.com/datastore/docs/concepts/transactions> to make 
sure that you only commit changes to Datastore if they also went through on 
Cloud SQL. This would be equivalent to your “caching the data” solution.


On Tuesday, May 30, 2017 at 6:11:46 PM UTC-4, Roxana Ioana Roman wrote:
>
> Thank you very much for your answers! 
> Yes, the database is on Cloud SQL and at the moment both services share 
> it. Why I want to split this database is firstly because it serves several 
> hundred queries per second, so a bit under stress and secondly in a 
> micro-services architecture (what I want to achieve at some point..), 
> separate services should have their own private data, which should only be 
> accessed through APIs, so each with their own database.  Therefore, since 
> the events sent between services won't be complex, PubSub would better suit 
> my need? There will be a large number of subscribers, since I will have 
> several instances of each service running at the same time ( so many 
> publishers and many subscribers I would say).
>
> On Tuesday, 30 May 2017 23:24:43 UTC+3, Yannick (Cloud Platform Support) 
> wrote:
>>
>> Hey Roxana, adding onto Alexey’s excellent answer: I understand that the 
>> database currently shared by your services is on Cloud SQL and that both of 
>> your services can currently access it independently. Could you please 
>> expand on your need for either restricting access to a single service or 
>> replicating the data across services and why the current situation isn’t 
>> desirable? This should help determine which solution is most appropriate.
>>  
>> You also asked a couple of other questions which I will attempt to answer 
>> here:
>> 1)  Can Cloud SQL Read Replicas be configured to ignore tables?
>>      At this time Cloud SQL Read Replicas cannot be configured to ignore 
>> tables, though there is a MySQL flag that can be used when configuring 
>> external replicas.
>>  
>> 2)  If I decide to synchronise the data between both databases, should I 
>> use PubSub or Task Queues?
>>      It really goes down to the details of your specific use case. Task 
>> Queues is more closely integrated with App Engine and while it's perfectly 
>> suitable for sending messages between services it is designed for executing 
>> complex long-running tasks. Cloud Pub/Sub on the other hand is a networked 
>> messaging service designed for broadcasting publications to a large number 
>> of subscribers. You could also communicate between your services by using 
>> UrlFetch but then you’d have to handle the retry mechanism yourself.
>>  
>> 3)  Can PubSub be modified to maintain order, somehow insert ordering 
>> information in the message payload to contain previous messages sent?
>>      The Cloud Pub/Sub documentation has an article that explains a few 
>> ways to handle message ordering. Is is possible but strict message ordering 
>> comes at the expense of performance and throughput.
>>  
>> 4)  How do I handle retries in PubSub?
>>      Cloud Pub/Sub offers an "at-least-once delivery" guarantee of each 
>> message you publish to each subscriber and will automatically handle 
>> retries until they are acknowledged by the subscriber.
>>
>> On Tuesday, May 30, 2017 at 2:34:41 PM UTC-4, Alexey wrote:
>>>
>>> Roxana,
>>>
>>> It may be useful to look closer at the nature of your data.  You 
>>> outlined 2 basic strategies: 
>>>
>>> 1. Replicate the data between 2 services, having one replica of that 
>>> data be the written to and you would presumably consider it source of 
>>> truth.  You have 2 variations on this theme, where you either would rely on 
>>> a database replication or use some messaging system to do the job.
>>> 2. Allow one service to call the other service for the data it needs.  
>>> Your suggestion to cache the data is good, but I would propose that you 
>>> cache the data on the side that's service HTTP responses.  Let the 
>>> dependent service always call the source service.  This will greatly 
>>> simplify your cache management strategy and you'd be able to create other 
>>> similar consumers for this system without maintaining different cache 
>>> policies.
>>>
>>> Each of these 2 ideas is workable and each has its trade-offs.  I would 
>>> say having a single replica of the data is arguably simpler and allows you 
>>> to totally bypass the question of DB syncing with the downside of a 
>>> service-to-service dependency.  I would say option 2 is an easier first 
>>> step and option 1 is a good thing to do if you find you have a performance 
>>> problem and you want to make the data queries with their required joins 
>>> faster.
>>>
>>> Depending on the nature of your data and your queries, there may be a 
>>> third option for you to consider.  Something in between these 2 
>>> strategies.  Conflict-free Replicated Data Types (
>>> https://en.wikipedia.org/wiki/Conflict-free_replicated_data_type 
>>> <https://www.google.com/url?q=https%3A%2F%2Fen.wikipedia.org%2Fwiki%2FConflict-free_replicated_data_type&sa=D&sntz=1&usg=AFQjCNHHHpRAoPCI_2uNInuWrFVpA9NGiw>)
>>>  
>>> provide some new options in situations, where it becomes possible to pass 
>>> messages across a distributed system in order to keep data eventually 
>>> consistent without relying on message order being preserved.  It requires 
>>> special care in designing your data model, but can be a good alternative to 
>>> service-to-service calls and tight DB coupling.  Hope this helps,
>>>
>>> Alexey
>>>
>>> On Tuesday, May 30, 2017 at 9:44:38 AM UTC-4, Roxana Ioana Roman wrote:
>>>>
>>>> Moving to a micro-service architecture, I have separated an AppEngine 
>>>> module into two services (the big one and a smaller one, still part of the 
>>>> same AppEngine project).
>>>>   
>>>> Next, I want to separate the database into two, so allowing the small 
>>>> micro-service to have only the needed tables in its own db.  The problem 
>>>> is 
>>>> one of the tables is needed by both micro-services. The smaller service 
>>>> only reads data from this table (has multiple select queries which join on 
>>>> data from this table)
>>>>
>>>> Option 1:
>>>> I can leave this table in the bigger service which uses it the most and 
>>>> make the smaller service make an http request to get the data that it 
>>>> needs, cache this data and be notified when this data changes by the owner 
>>>> service of that table. Then, to refresh its cache, the smaller service 
>>>> makes an http request again.
>>>>
>>>> In order to communicate from service A to service B, I was thinking 
>>>> between using Google PubSub or Task Queue. And I am not sure which one to 
>>>> use here. In this case, receiving the message in order is not important 
>>>> since the message will be generic "table_state_changed, query for new 
>>>> data", so both could be used..
>>>>
>>>> Option 2:
>>>> I can duplicate the table in both databases (this will allow the 
>>>> service to have all the necessary data closer).  When the bigger service 
>>>> modifies data in the table, it will notify the smaller service to perform 
>>>> the same modification on its version of the table. In this case, the order 
>>>> of the messages is important, since it specifies exact crud operations to 
>>>> perform on the table. 
>>>>
>>>> Can PubSub be modified to maintain order, somehow insert ordering 
>>>> information in the message payload to contain previous messages sent? 
>>>> Retries are also important, we don't want to end up having 
>>>> inconsistencies between the two tables.
>>>>
>>>> Option 3:
>>>> Is there a way to create Read Replicas with ignored tables in AppEngine 
>>>> (so that I include only the tables I need in one of the services and leave 
>>>> the other one with the entire db as it is currently) and set a specific 
>>>> service to only use that replica? This does not sound as a good idea, 
>>>> however it leaves CloudSql the burden of maintaining same data in both 
>>>> versions of the table).
>>>>
>>>> Which option is better in your opinion and most importantly which one 
>>>> is better suited for this case? PubSub or TaskQueues. 
>>>> Also, this is only the first step to separate the monolith, there will 
>>>> be other services in the future, that would probably encounter the same 
>>>> problem.
>>>>
>>>

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/google-appengine.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/google-appengine/384ccf99-941b-4e00-aee3-828a0ec48bb6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[google-appengine] Re: Synchronizing between different Cloud Sql tables with GooglePubSub or TaskQueue

Reply via email to