Robert, thanks a lot for your sugestions! Increasing bucket size made a huge difference. Need to study theoretical part... and find the optimal bucket size for 50/sec.
yep, I use creatework directly without fanout. I will try to insert 'work' models within my original data transaction and compare the performance. On Nov 9, 3:14 am, Robert Kluin <[email protected]> wrote: > Hey Dmitry, > I am working on getting some decent documentation about when you > might want to use fanout versus directly using creatwork. And, about > usage in general. If I am dealing with one or two aggregations I > usually use creatework directly. You can only insert five > transactional tasks in one database transaction, so with four you > could directly use creatework eliminating a fanout task. > > As far as rates go, I have been using a rate of 35/s and bucket size > of 40. However, I also get periodic queue backups. I think the max > rate / sec is currently 50, but I thought there was an announcement it > was getting increased (maybe I am just remembering the increase to > 50/sec announcement though). You might want to bump your rate up to > 50/sec. I always use a dedicated queue for creatework and aggregation > tasks. In one of my apps I use multiple queues to get a bit higher > throughput. > > I generally prefer to use creatework tasks; they cleanly handle any > failures that occur and keeps my primary processing running as fast as > possible. However, when I first started using this type of > aggregation technique I created the 'work' models and attempted to > insert the aggregator task (non-transactionaly!) within my primary > transaction. If your primary processing is within tasks, and your > tasks are fast enough, give it a shot. Converting CreateWorkHandler > to something you can use directly should not be a big deal. > > Robert > > On Mon, Nov 8, 2010 at 18:14, Dmitry <[email protected]> wrote: > > Hi Robert, > > > What queue configuration do you use for your system? > > I came to another problem. I usually process several feeds in parallel > > and can insert up to 20-30 new items to the database. With 4 > > aggregators it's >80 create_work tasks in one moment. So after a > > minute I can have up to 1000 tasks in queue... so I have up to 5 > > minutes delay in processing. > > > It seems that for initial aggregation I should insert create work > > models not in tasks. > > I messed up again:) > > > On Nov 5, 6:46 am, Robert Kluin <[email protected]> wrote: > >> Dmitry, > >> I finally got the time to make these changes. Let me know if that > >> works for your use-case. > > >> I really appreciate all of your suggestions and help with this. > > >> Robert > > >> 2010/11/3 Dmitry <[email protected]>: > > >> > oops I read expression in wrong direction. This will definitely work! > > >> > On Nov 3, 7:43 pm, Robert Kluin <[email protected]> wrote: > >> >> Dmitry, > >> >> š Right, I know those will cause problems. So what about my suggested > >> >> solution of using: > > >> >> šif not re.match("^[a-zA-Z0-9-]+$", task_name): > >> >> š š š task_name = šsha1_hash(task_name) > > >> >> That should correctly handle your use cases, since the full name will > >> >> be hashed. > > >> >> Are there issues with that solution I am not seeing? > > >> >> Robert > > >> >> On Nov 3, 2010, at 3:52, Dmitry <[email protected]> wrote: > > >> >> > Robert, > > >> >> > You will get into the trouble with these aggregations: > > >> >> > urls: > >> >> > http://ÐÒÁ×ÉÔÅÌØÓÔ×Ï.ÒÆ/search/?phrase=ÎÁÌÏǧion=gov_events -> > >> >> > httpsearchphrase > >> >> > http://ÐÒÁ×ÉÔÅÌØÓÔ×Ï.ÒÆ/search/?phrase=ÐÒÅÚÉÄÅÎÔ§ion=gov_events -> > >> >> > httpsearchphrase > > >> >> > or usernames: > >> >> > ÍÓÔÉÔÅÌØ2000 -> 2000 > >> >> > ÔÅÓÔ2000 -> 2000 > > >> >> > but anyway in most cases your approach will work well:) You can leave > >> >> > it up to the user (add some kind of flag "use_hash"). > > >> >> > or we can try to url encode strings: > >> >> > urllib.quote(task_name.encode('utf-8')) > >> >> > http3AD0BFD180D0B0D0B2D0B8D182D0B5D0BBD18CD181D182D0B2D0BED180D184search3Fphrase3DD0BDD0B0D0BBD0BED0B3 > >> >> > http3AD0BFD180D0B0D0B2D0B8D182D0B5D0BBD18CD181D182D0B2D0BED180D184search3Fphrase3DD0BFD180D0B5D0B7D0B8D0B4D0B5D0BDD182 > > >> >> > but this is not better that hash :-D > > >> >> > thanks > > >> >> > On Nov 3, 7:13 am, Robert Kluin <[email protected]> wrote: > >> >> >> Hey Dmitry, > >> >> >> š I am sure the "fix" in that commit is _not_ a good idea. > >> >> >> šOriginally > >> >> >> I stuck it in because I use entity keys as the task-name, sometimes > >> >> >> they contains characters not allowed in task-names. šI actually > >> >> >> debated for several days about pushing that update out; šfinally I > >> >> >> decide to push and hope someone would notice and offer their > >> >> >> thoughts. > > >> >> >> š I like your idea a lot. šBut, for many aggregations I like to use > >> >> >> entity keys, it makes it possible for me to visually see what a task > >> >> >> is doing. šWhat do you think about something like the following > >> >> >> approach: > > >> >> >> š if not re.match("^[a-zA-Z0-9-]+$", task_name): > >> >> >> š š š task_name = sha1_hash(task_name) > > >> >> >> That should allow 'valid' names to remain as-is, but it will safely > >> >> >> encode non-valid task-names. šDo you think that is an acceptable > >> >> >> method? > > >> >> >> Thanks a lot for your feedback. > > >> >> >> Robert > > >> >> >> On Tue, Nov 2, 2010 at 07:15, Dmitry <[email protected]> > >> >> >> wrote: > >> >> >>> Hi Robert, > > >> >> >>> Regarding your latest commit: > > >> >> >>> # TODO: find a better solution for cleaning up the name. > >> >> >>> task_name = re.sub('[^a-zA-Z0-9-]', '', task_name)[:500] > > >> >> >>> Don't think this is a good idea:) For example I have unicode > >> >> >>> characters in aggregation value. In this case regexp will return > >> >> >>> nothing. > >> >> >>> I use sha1 hash now... but there's also a little possibility of > >> >> >>> collision > > >> >> >>> sha1_hash(self.agg_name) > > >> >> >>> def utf8encoded(data): > >> >> >>> šif data is None: > >> >> >>> š šreturn None > >> >> >>> šif isinstance(data, unicode): > >> >> >>> š šreturn unicode(data).encode('utf-8') > >> >> >>> šelse: > >> >> >>> š šreturn data > > >> >> >>> def sha1_hash(value): > >> >> >>> šreturn hashlib.sha1(utf8encoded(value)).hexdigest() > > >> >> >>> On Oct 24, 9:26 pm, Robert Kluin <[email protected]> wrote: > >> >> >>>> Hi Dmitry, > >> >> >>>> š Glad to hear it was helpful! šNot sure when you checked it out > >> >> >>>> last, > >> >> >>>> but I made a number of good (I think) improvements in the last > >> >> >>>> couple > >> >> >>>> days, such as continuations to allow splitting large groups of work > >> >> >>>> up. > > >> >> >>>> Robert > > >> >> >>>> On Sun, Oct 24, 2010 at 07:57, Dmitry <[email protected]> > >> >> >>>> wrote: > >> >> >>>>> Robert, > > >> >> >>>>> You grouping_with_date_rollup.py example was extremely helpful. > >> >> >>>>> Thanks > >> >> >>>>> a lot again! :) > > >> >> >>>>> On Oct 14, 8:47 pm, Robert Kluin <[email protected]> wrote: > >> >> >>>>>> Hey Carles, > >> >> >>>>>> š Glad it seems helpful. šI am hoping to get time today to push > >> >> >>>>>> out > >> >> >>>>>> some revisions and sample code. > > >> >> >>>>>> Robert > > >> >> >>>>>> On Thu, Oct 14, 2010 at 05:50, Carles Gonzalez > >> >> >>>>>> <[email protected]> wrote: > >> >> >>>>>>> Robert, I took a brief inspection at your code and seems very > >> >> >>>>>>> cool. Exactly > >> >> >>>>>>> what i was lloking for for my report generation and such. > >> >> >>>>>>> I'm looking forward for more examples, but it seems a very > >> >> >>>>>>> valuable addition > >> >> >>>>>>> for our toolbox. > >> >> >>>>>>> Thanks a lot! > > >> >> >>>>>>> On Wed, Oct 13, 2010 at 9:20 PM, Carles Gonzalez > >> >> >>>>>>> <[email protected]> wrote: > > >> >> >>>>>>>> Neat! I'm going to see this code, hopefully I'll understand > >> >> >>>>>>>> something :) > >> >> >>>>>>>> On Wednesday, October 13, 2010, Robert Kluin > >> >> >>>>>>>> <[email protected]> > >> >> >>>>>>>> wrote: > >> >> >>>>>>>>> Hey Dmitry, > >> >> >>>>>>>>> š šIn case it might help, I pushed some code to bitbucket. > >> >> >>>>>>>>> šAt the > >> >> >>>>>>>>> moment I would (personally) say the code is not too pretty, > >> >> >>>>>>>>> but it > >> >> >>>>>>>>> works well. š:) > >> >> >>>>>>>>> š š šhttp://bitbucket.org/thebobert/slagg > > >> >> >>>>>>>>> š Sorry it does not really have good documentation at the > >> >> >>>>>>>>> moment, but > >> >> >>>>>>>>> I think the basic example I threw together will give you a > >> >> >>>>>>>>> good idea > >> >> >>>>>>>>> of how to use it. šI need to do another cleanup pass over the > >> >> >>>>>>>>> API to > >> >> >>>>>>>>> make a few more refinements. > > >> >> >>>>>>>>> š šI pulled this code out of one of my apps, and tried to > >> >> >>>>>>>>> quickly > >> >> >>>>>>>>> refactor it to be a bit more generic. šWe are currently using > >> >> >>>>>>>>> basically the same code in three apps to do some really > >> >> >>>>>>>>> complex > >> >> >>>>>>>>> calculations. šAs soon as I get time I will get an example up > >> >> >>>>>>>>> showing > >> >> >>>>>>>>> how to use it for neat stuff, like overall, yearly, monthly, > >> >> >>>>>>>>> and daily > >> >> >>>>>>>>> aggregates across multiple values (like total dollars and > >> >> >>>>>>>>> quantity). > >> >> >>>>>>>>> The cool thing is that you can do all of those aggregations > >> >> >>>>>>>>> across > >> >> >>>>>>>>> various groupings, like customer, company, contact, and > >> >> >>>>>>>>> sales-person, > >> >> >>>>>>>>> at once. šI'll get that code pushed out in the next few days. > > >> >> >>>>>>>>> š Would love to get some feedback on it. > > >> >> >>>>>>>>> Robert > > >> >> >>>>>>>>> On Tue, Oct 12, 2010 at 17:26, Dmitry > >> >> >>>>>>>>> <[email protected]> wrote: > >> >> >>>>>>>>>> Ben, thanks for your code! I'm trying to understand all this > >> >> >>>>>>>>>> stuff > >> >> >>>>>>>>>> too... > >> >> >>>>>>>>>> Robert, any success with your "library"? May be you've > >> >> >>>>>>>>>> already done > >> >> >>>>>>>>>> all stuff we are trying to implement... > > >> >> >>>>>>>>>> p.s. where is Brett S.:) would like to hear his comments on > >> >> >>>>>>>>>> this > > >> >> >>>>>>>>>> On Sep 21, 1:49 pm, Ben <[email protected]> wrote: > >> >> >>>>>>>>>>> Thanks for your insights. I would love feedback on this > >> >> >>>>>>>>>>> implementation > >> >> >>>>>>>>>>> (Brett S. suggested we send in our code for > >> >> >>>>>>>>>>> this)http://pastebin.com/3pUhFdk8 > > >> >> >>>>>>>>>>> This implementation is for just one materialized view row > >> >> >>>>>>>>>>> at a time > >> >> >>>>>>>>>>> (e.g. a simple counter, no presence markers). Hopefully > >> >> >>>>>>>>>>> putting an ETA > >> >> >>>>>>>>>>> on the transactional task will relieve the write pressure, > >> >> >>>>>>>>>>> since > >> >> >>>>>>>>>>> usually it should be an old update with an out-of-date > >> >> >>>>>>>>>>> sequence number > >> >> >>>>>>>>>>> and be > > ... > > read more » -- You received this message because you are subscribed to the Google Groups "Google App Engine" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.
