Or maybe it blocks on different result sets just not on getting the next
fetch block?? Hmmm. Sounds like a tough problem.

On Mon, Feb 14, 2011 at 2:09 PM, Stephen Johnson <[email protected]>wrote:

> Are you using .asList (which I think blocks like you describe), but I
> thought asIterable or asIterator wasn't suppose to. (if you're using Java).
>
>
> On Mon, Feb 14, 2011 at 12:38 PM, Edward Hartwell Goose <
> [email protected]> wrote:
>
>> Hi Calvin & Stephen,
>>
>> Thanks for the ideas.
>>
>> Calvin:
>> We can't do the filtering in memory. We potentially have a car making
>> a journey (the car analogy isn't so good...) making a journey every 3
>> seconds, and we could have up to 2,000 cars.
>>
>> We need to be able to look back up to 2 months, so it could be up to
>> 1.8 billion rows in this table.
>>
>> Stephen:
>> That's an interesting idea. However the Asynchronous api actually
>> fires the requests synchronously, it just doesn't block. (Or at least,
>> that's my experience).
>>
>> So, at the moment we fire off 1 query (which Google turns into 2) for
>> each site. And although the method call returns instantly, it still
>> takes ~5 seconds in total with basic test data. If each call takes
>> 12ms, we still have to wait 24 seconds for 2,000 sites.
>>
>> So, the first call starts at time 0, the second call starts at 0+12,
>> the third at 0+12+12... etc. With 2,000 sites, this works out about 24
>> seconds. Once you've added in the overheads and getting the list of
>> Cars in the first place, it's too long.
>>
>> If we could start even 100 queries at the same time of time 0, that'd
>> be superb. We thought we could do it with multithreading, but that's
>> not allowed on App Engine.
>>
>> Finally - I've also posted this on StackOverflow -
>>
>> http://stackoverflow.com/questions/4993744/selecting-distinct-entities-across-a-large-google-app-engine-table/4994494#4994494
>>
>> I'll try and keep both updated.
>>
>> Any more thoughts welcome!
>> Ed
>>
>> On Feb 14, 6:47 pm, Calvin <[email protected]> wrote:
>> > Can you do filtering in memory?
>> >
>> > This query would give you all of the journeys for a list of cars within
>> the
>> > date range:
>> > carlist = ['123','333','543','753','963','1236']
>> > start_date = datetime.datetime(2011, 1, 30)
>> > end_date = datetime(2011, 2, 10)
>> >
>> > journeys = Journey.all().filter('start >', start_date).filter('start <',
>> > end_date).filter('car IN', carlist).order('-start').fetch(100)
>> > len(journeys)
>> > 43 # <- since it's less than 100 I know I've gotten them all
>> >
>> > then since the list is sorted I know the first entry per car is the most
>> > recent journey:
>> >
>> > results = {}
>> > for journey in journeys:
>> > ...   if journey.car in results:
>> > ...     continue
>> > ...   results[journey.car] = journey
>> >
>> > len(results)
>> > 6
>> >
>> > for result in results.values():
>> > ...   print("%s : %s" % (result.car, result.start))
>> > 753 : 2011-02-09 12:38:48.887976
>> > 1236 : 2011-02-06 13:59:35.221003
>> > 963 : 2011-02-08 14:03:54.587609
>> > 333 : 2011-02-09 10:40:09.466700
>> > 543 : 2011-02-09 15:28:53.197123
>> > 123 : 2011-02-09 14:09:02.680870
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "Google App Engine" group.
>> To post to this group, send email to [email protected].
>> To unsubscribe from this group, send email to
>> [email protected].
>> For more options, visit this group at
>> http://groups.google.com/group/google-appengine?hl=en.
>>
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.

Reply via email to