What about using BigQuery, anybody has tried for this specific purpose?
Inserting data and exporting a whole table is free at this stage.

By the way, I've tried a couple of strategies, involving entity groups. I 
started storing the timestamp as key.
That improved things a little bit, in the sense my entities were smaller, 
so I could loop over them faster.
Cost-wise it's not been too bad but my sets don't go over 1M rows each.
If you can live with BigQuery append-only's restriction, I would definitely 
try it.

Emanuele

On Monday, 2 February 2015 07:11:26 UTC+13, Rafael Sanches wrote:
>
> By the way, this isn't a datastore specific problem. Even on mysql, you 
> don't want to be querying millions of rows to draw a simple summary. 
>
> On Sun, Feb 1, 2015 at 10:09 AM, Rafael <[email protected] <javascript:>> 
> wrote:
>
>> To solve that problem you can have DataPoint as a temporary table only. 
>>
>> That way, every 5 minutes you can run a cron that download all DataPoint 
>> and deletes them after you summarize the content in another table. 
>>
>> You can summarize on a 5 minute table, then the average from that table 
>> goes to the 30 minutes summary and the avg of that one goes to the 2 hour 
>> one.. etc. 
>>
>> On Wed, Aug 14, 2013 at 5:51 AM, Martin Trummer <[email protected] 
>> <javascript:>> wrote:
>>
>>> okay, so you have 2 entity types "TimeSeriesIndex" and "DataPoint"
>>> but what about the DataPoint entity - you also have the same problem 
>>> there, right?
>>> all your data ends up in the DataPoint entity - or does your cron-job 
>>> delete the DataPoints, after generating the TimeSeriesIndex?
>>>
>>>
>>> On Wednesday, 14 August 2013 01:20:15 UTC+2, Rafael Sanches wrote:
>>>>
>>>> i implemented this by having these components: 
>>>>
>>>> - TimeSeriesIndex - different rows for hour, day, week, month, year, 
>>>> etc. You can squeeze a lot of data in 1mb :)
>>>> - DataPoint - unprocessed data point data. thousands of rows per 
>>>> minute. 
>>>> - cron that process the datapoints inside the indexes
>>>> - the ui read only TimeSeriesIndex that contains the timestamps and the 
>>>> data points. 
>>>>
>>>> thanks
>>>> rafa
>>>>
>>>>
>>>> On Tue, Aug 13, 2013 at 1:42 PM, Jay <[email protected]> wrote:
>>>>
>>>>> In my opinion, your biggest take away from this should be to avoid 
>>>>> having a mega entity group and you do this by simply *not* having all 
>>>>> the entities in question have the same parent. Or perhaps more pointedly, 
>>>>> any parent at all. Unless there is a really strong case to put many 
>>>>> thousands of entities in the same entity group, I just wouldn't do it. 
>>>>> You 
>>>>> can have transactions across entity groups now so if you need a 
>>>>> transaction 
>>>>> with a few entities you are OK. 
>>>>> As you need to relate the entities, do that by some other means 
>>>>> instead of a parent entity. For example, you could use a ndb.KeyProperty 
>>>>> or 
>>>>> possibly just an encoded string or something along those lines. 
>>>>>
>>>>>
>>>>> On Tuesday, August 13, 2013 7:59:52 AM UTC-5, Martin Trummer wrote:
>>>>>>
>>>>>> I'm a newbie to the AppEngine datastore and like to know how to best 
>>>>>> design this use case:
>>>>>> there may be some time-series with huge amount of data: e.g. 
>>>>>> terra-bytes for one time-series
>>>>>> the transacations doc 
>>>>>> <https://developers.google.com/appengine/docs/java/datastore/transactions>
>>>>>>  
>>>>>> says about entity groups:
>>>>>>
>>>>>>    - *"Every entity belongs to an entity group, a set of one or more 
>>>>>>    entities that can be manipulated in a single transaction."*
>>>>>>    - *"every entity with a given root entity as an ancestor is in 
>>>>>>    the same entity group. All entities in a group are stored in the same 
>>>>>>    Datastore node."* 
>>>>>>
>>>>>> so does that mean, that all the terra-bytes of data for the huge 
>>>>>> time-series would end up *on one computer* somewhere in the 
>>>>>> AppEngine network?
>>>>>> if so: 
>>>>>>
>>>>>>    - that's not a good idea, right?
>>>>>>    - how to avoid it? should I split up the data in sections (e.g. 
>>>>>>    per month) where each section has it's own kind/entity group?
>>>>>>    
>>>>>>  -- 
>>>>> You received this message because you are subscribed to the Google 
>>>>> Groups "Google App Engine" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>>> an email to [email protected].
>>>>> To post to this group, send email to [email protected].
>>>>> Visit this group at http://groups.google.com/group/google-appengine.
>>>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>>>  
>>>>>  
>>>>>
>>>>
>>>>  -- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "Google App Engine" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to [email protected] <javascript:>.
>>> To post to this group, send email to [email protected] 
>>> <javascript:>.
>>> Visit this group at http://groups.google.com/group/google-appengine.
>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>  
>>>  
>>>
>>
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/google-appengine.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/google-appengine/982a4b0d-fdba-4457-935c-ef14c5e00670%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to