Or maybe it blocks on different result sets just not on getting the next fetch block?? Hmmm. Sounds like a tough problem.
On Mon, Feb 14, 2011 at 2:09 PM, Stephen Johnson <[email protected]>wrote: > Are you using .asList (which I think blocks like you describe), but I > thought asIterable or asIterator wasn't suppose to. (if you're using Java). > > > On Mon, Feb 14, 2011 at 12:38 PM, Edward Hartwell Goose < > [email protected]> wrote: > >> Hi Calvin & Stephen, >> >> Thanks for the ideas. >> >> Calvin: >> We can't do the filtering in memory. We potentially have a car making >> a journey (the car analogy isn't so good...) making a journey every 3 >> seconds, and we could have up to 2,000 cars. >> >> We need to be able to look back up to 2 months, so it could be up to >> 1.8 billion rows in this table. >> >> Stephen: >> That's an interesting idea. However the Asynchronous api actually >> fires the requests synchronously, it just doesn't block. (Or at least, >> that's my experience). >> >> So, at the moment we fire off 1 query (which Google turns into 2) for >> each site. And although the method call returns instantly, it still >> takes ~5 seconds in total with basic test data. If each call takes >> 12ms, we still have to wait 24 seconds for 2,000 sites. >> >> So, the first call starts at time 0, the second call starts at 0+12, >> the third at 0+12+12... etc. With 2,000 sites, this works out about 24 >> seconds. Once you've added in the overheads and getting the list of >> Cars in the first place, it's too long. >> >> If we could start even 100 queries at the same time of time 0, that'd >> be superb. We thought we could do it with multithreading, but that's >> not allowed on App Engine. >> >> Finally - I've also posted this on StackOverflow - >> >> http://stackoverflow.com/questions/4993744/selecting-distinct-entities-across-a-large-google-app-engine-table/4994494#4994494 >> >> I'll try and keep both updated. >> >> Any more thoughts welcome! >> Ed >> >> On Feb 14, 6:47 pm, Calvin <[email protected]> wrote: >> > Can you do filtering in memory? >> > >> > This query would give you all of the journeys for a list of cars within >> the >> > date range: >> > carlist = ['123','333','543','753','963','1236'] >> > start_date = datetime.datetime(2011, 1, 30) >> > end_date = datetime(2011, 2, 10) >> > >> > journeys = Journey.all().filter('start >', start_date).filter('start <', >> > end_date).filter('car IN', carlist).order('-start').fetch(100) >> > len(journeys) >> > 43 # <- since it's less than 100 I know I've gotten them all >> > >> > then since the list is sorted I know the first entry per car is the most >> > recent journey: >> > >> > results = {} >> > for journey in journeys: >> > ... if journey.car in results: >> > ... continue >> > ... results[journey.car] = journey >> > >> > len(results) >> > 6 >> > >> > for result in results.values(): >> > ... print("%s : %s" % (result.car, result.start)) >> > 753 : 2011-02-09 12:38:48.887976 >> > 1236 : 2011-02-06 13:59:35.221003 >> > 963 : 2011-02-08 14:03:54.587609 >> > 333 : 2011-02-09 10:40:09.466700 >> > 543 : 2011-02-09 15:28:53.197123 >> > 123 : 2011-02-09 14:09:02.680870 >> >> -- >> You received this message because you are subscribed to the Google Groups >> "Google App Engine" group. >> To post to this group, send email to [email protected]. >> To unsubscribe from this group, send email to >> [email protected]. >> For more options, visit this group at >> http://groups.google.com/group/google-appengine?hl=en. >> >> > -- You received this message because you are subscribed to the Google Groups "Google App Engine" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.
