Are you using .asList (which I think blocks like you describe), but I thought asIterable or asIterator wasn't suppose to. (if you're using Java).
On Mon, Feb 14, 2011 at 12:38 PM, Edward Hartwell Goose <[email protected]>wrote: > Hi Calvin & Stephen, > > Thanks for the ideas. > > Calvin: > We can't do the filtering in memory. We potentially have a car making > a journey (the car analogy isn't so good...) making a journey every 3 > seconds, and we could have up to 2,000 cars. > > We need to be able to look back up to 2 months, so it could be up to > 1.8 billion rows in this table. > > Stephen: > That's an interesting idea. However the Asynchronous api actually > fires the requests synchronously, it just doesn't block. (Or at least, > that's my experience). > > So, at the moment we fire off 1 query (which Google turns into 2) for > each site. And although the method call returns instantly, it still > takes ~5 seconds in total with basic test data. If each call takes > 12ms, we still have to wait 24 seconds for 2,000 sites. > > So, the first call starts at time 0, the second call starts at 0+12, > the third at 0+12+12... etc. With 2,000 sites, this works out about 24 > seconds. Once you've added in the overheads and getting the list of > Cars in the first place, it's too long. > > If we could start even 100 queries at the same time of time 0, that'd > be superb. We thought we could do it with multithreading, but that's > not allowed on App Engine. > > Finally - I've also posted this on StackOverflow - > > http://stackoverflow.com/questions/4993744/selecting-distinct-entities-across-a-large-google-app-engine-table/4994494#4994494 > > I'll try and keep both updated. > > Any more thoughts welcome! > Ed > > On Feb 14, 6:47 pm, Calvin <[email protected]> wrote: > > Can you do filtering in memory? > > > > This query would give you all of the journeys for a list of cars within > the > > date range: > > carlist = ['123','333','543','753','963','1236'] > > start_date = datetime.datetime(2011, 1, 30) > > end_date = datetime(2011, 2, 10) > > > > journeys = Journey.all().filter('start >', start_date).filter('start <', > > end_date).filter('car IN', carlist).order('-start').fetch(100) > > len(journeys) > > 43 # <- since it's less than 100 I know I've gotten them all > > > > then since the list is sorted I know the first entry per car is the most > > recent journey: > > > > results = {} > > for journey in journeys: > > ... if journey.car in results: > > ... continue > > ... results[journey.car] = journey > > > > len(results) > > 6 > > > > for result in results.values(): > > ... print("%s : %s" % (result.car, result.start)) > > 753 : 2011-02-09 12:38:48.887976 > > 1236 : 2011-02-06 13:59:35.221003 > > 963 : 2011-02-08 14:03:54.587609 > > 333 : 2011-02-09 10:40:09.466700 > > 543 : 2011-02-09 15:28:53.197123 > > 123 : 2011-02-09 14:09:02.680870 > > -- > You received this message because you are subscribed to the Google Groups > "Google App Engine" group. > To post to this group, send email to [email protected]. > To unsubscribe from this group, send email to > [email protected]. > For more options, visit this group at > http://groups.google.com/group/google-appengine?hl=en. > > -- You received this message because you are subscribed to the Google Groups "Google App Engine" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.
