Are you using .asList (which I think blocks like you describe), but I
thought asIterable or asIterator wasn't suppose to. (if you're using Java).

On Mon, Feb 14, 2011 at 12:38 PM, Edward Hartwell Goose
<[email protected]>wrote:

> Hi Calvin & Stephen,
>
> Thanks for the ideas.
>
> Calvin:
> We can't do the filtering in memory. We potentially have a car making
> a journey (the car analogy isn't so good...) making a journey every 3
> seconds, and we could have up to 2,000 cars.
>
> We need to be able to look back up to 2 months, so it could be up to
> 1.8 billion rows in this table.
>
> Stephen:
> That's an interesting idea. However the Asynchronous api actually
> fires the requests synchronously, it just doesn't block. (Or at least,
> that's my experience).
>
> So, at the moment we fire off 1 query (which Google turns into 2) for
> each site. And although the method call returns instantly, it still
> takes ~5 seconds in total with basic test data. If each call takes
> 12ms, we still have to wait 24 seconds for 2,000 sites.
>
> So, the first call starts at time 0, the second call starts at 0+12,
> the third at 0+12+12... etc. With 2,000 sites, this works out about 24
> seconds. Once you've added in the overheads and getting the list of
> Cars in the first place, it's too long.
>
> If we could start even 100 queries at the same time of time 0, that'd
> be superb. We thought we could do it with multithreading, but that's
> not allowed on App Engine.
>
> Finally - I've also posted this on StackOverflow -
>
> http://stackoverflow.com/questions/4993744/selecting-distinct-entities-across-a-large-google-app-engine-table/4994494#4994494
>
> I'll try and keep both updated.
>
> Any more thoughts welcome!
> Ed
>
> On Feb 14, 6:47 pm, Calvin <[email protected]> wrote:
> > Can you do filtering in memory?
> >
> > This query would give you all of the journeys for a list of cars within
> the
> > date range:
> > carlist = ['123','333','543','753','963','1236']
> > start_date = datetime.datetime(2011, 1, 30)
> > end_date = datetime(2011, 2, 10)
> >
> > journeys = Journey.all().filter('start >', start_date).filter('start <',
> > end_date).filter('car IN', carlist).order('-start').fetch(100)
> > len(journeys)
> > 43 # <- since it's less than 100 I know I've gotten them all
> >
> > then since the list is sorted I know the first entry per car is the most
> > recent journey:
> >
> > results = {}
> > for journey in journeys:
> > ...   if journey.car in results:
> > ...     continue
> > ...   results[journey.car] = journey
> >
> > len(results)
> > 6
> >
> > for result in results.values():
> > ...   print("%s : %s" % (result.car, result.start))
> > 753 : 2011-02-09 12:38:48.887976
> > 1236 : 2011-02-06 13:59:35.221003
> > 963 : 2011-02-08 14:03:54.587609
> > 333 : 2011-02-09 10:40:09.466700
> > 543 : 2011-02-09 15:28:53.197123
> > 123 : 2011-02-09 14:09:02.680870
>
> --
> You received this message because you are subscribed to the Google Groups
> "Google App Engine" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to
> [email protected].
> For more options, visit this group at
> http://groups.google.com/group/google-appengine?hl=en.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.

Reply via email to