Hey Dmitry, I am working on getting some decent documentation about when you might want to use fanout versus directly using creatwork. And, about usage in general. If I am dealing with one or two aggregations I usually use creatework directly. You can only insert five transactional tasks in one database transaction, so with four you could directly use creatework eliminating a fanout task.
As far as rates go, I have been using a rate of 35/s and bucket size of 40. However, I also get periodic queue backups. I think the max rate / sec is currently 50, but I thought there was an announcement it was getting increased (maybe I am just remembering the increase to 50/sec announcement though). You might want to bump your rate up to 50/sec. I always use a dedicated queue for creatework and aggregation tasks. In one of my apps I use multiple queues to get a bit higher throughput. I generally prefer to use creatework tasks; they cleanly handle any failures that occur and keeps my primary processing running as fast as possible. However, when I first started using this type of aggregation technique I created the 'work' models and attempted to insert the aggregator task (non-transactionaly!) within my primary transaction. If your primary processing is within tasks, and your tasks are fast enough, give it a shot. Converting CreateWorkHandler to something you can use directly should not be a big deal. Robert On Mon, Nov 8, 2010 at 18:14, Dmitry <[email protected]> wrote: > Hi Robert, > > What queue configuration do you use for your system? > I came to another problem. I usually process several feeds in parallel > and can insert up to 20-30 new items to the database. With 4 > aggregators it's >80 create_work tasks in one moment. So after a > minute I can have up to 1000 tasks in queue... so I have up to 5 > minutes delay in processing. > > It seems that for initial aggregation I should insert create work > models not in tasks. > I messed up again:) > > On Nov 5, 6:46 am, Robert Kluin <[email protected]> wrote: >> Dmitry, >> I finally got the time to make these changes. Let me know if that >> works for your use-case. >> >> I really appreciate all of your suggestions and help with this. >> >> Robert >> >> 2010/11/3 Dmitry <[email protected]>: >> >> > oops I read expression in wrong direction. This will definitely work! >> >> > On Nov 3, 7:43 pm, Robert Kluin <[email protected]> wrote: >> >> Dmitry, >> >> š Right, I know those will cause problems. So what about my suggested >> >> solution of using: >> >> >> šif not re.match("^[a-zA-Z0-9-]+$", task_name): >> >> š š š task_name = šsha1_hash(task_name) >> >> >> That should correctly handle your use cases, since the full name will be >> >> hashed. >> >> >> Are there issues with that solution I am not seeing? >> >> >> Robert >> >> >> On Nov 3, 2010, at 3:52, Dmitry <[email protected]> wrote: >> >> >> > Robert, >> >> >> > You will get into the trouble with these aggregations: >> >> >> > urls: >> >> > http://ÐÒÁ×ÉÔÅÌØÓÔ×Ï.ÒÆ/search/?phrase=ÎÁÌÏǧion=gov_events -> >> >> > httpsearchphrase >> >> > http://ÐÒÁ×ÉÔÅÌØÓÔ×Ï.ÒÆ/search/?phrase=ÐÒÅÚÉÄÅÎÔ§ion=gov_events -> >> >> > httpsearchphrase >> >> >> > or usernames: >> >> > ÍÓÔÉÔÅÌØ2000 -> 2000 >> >> > ÔÅÓÔ2000 -> 2000 >> >> >> > but anyway in most cases your approach will work well:) You can leave >> >> > it up to the user (add some kind of flag "use_hash"). >> >> >> > or we can try to url encode strings: >> >> > urllib.quote(task_name.encode('utf-8')) >> >> > http3AD0BFD180D0B0D0B2D0B8D182D0B5D0BBD18CD181D182D0B2D0BED180D184search3Fphrase3DD0BDD0B0D0BBD0BED0B3 >> >> > http3AD0BFD180D0B0D0B2D0B8D182D0B5D0BBD18CD181D182D0B2D0BED180D184search3Fphrase3DD0BFD180D0B5D0B7D0B8D0B4D0B5D0BDD182 >> >> >> > but this is not better that hash :-D >> >> >> > thanks >> >> >> > On Nov 3, 7:13 am, Robert Kluin <[email protected]> wrote: >> >> >> Hey Dmitry, >> >> >> š I am sure the "fix" in that commit is _not_ a good idea. šOriginally >> >> >> I stuck it in because I use entity keys as the task-name, sometimes >> >> >> they contains characters not allowed in task-names. šI actually >> >> >> debated for several days about pushing that update out; šfinally I >> >> >> decide to push and hope someone would notice and offer their thoughts. >> >> >> >> š I like your idea a lot. šBut, for many aggregations I like to use >> >> >> entity keys, it makes it possible for me to visually see what a task >> >> >> is doing. šWhat do you think about something like the following >> >> >> approach: >> >> >> >> š if not re.match("^[a-zA-Z0-9-]+$", task_name): >> >> >> š š š task_name = sha1_hash(task_name) >> >> >> >> That should allow 'valid' names to remain as-is, but it will safely >> >> >> encode non-valid task-names. šDo you think that is an acceptable >> >> >> method? >> >> >> >> Thanks a lot for your feedback. >> >> >> >> Robert >> >> >> >> On Tue, Nov 2, 2010 at 07:15, Dmitry <[email protected]> wrote: >> >> >>> Hi Robert, >> >> >> >>> Regarding your latest commit: >> >> >> >>> # TODO: find a better solution for cleaning up the name. >> >> >>> task_name = re.sub('[^a-zA-Z0-9-]', '', task_name)[:500] >> >> >> >>> Don't think this is a good idea:) For example I have unicode >> >> >>> characters in aggregation value. In this case regexp will return >> >> >>> nothing. >> >> >>> I use sha1 hash now... but there's also a little possibility of >> >> >>> collision >> >> >> >>> sha1_hash(self.agg_name) >> >> >> >>> def utf8encoded(data): >> >> >>> šif data is None: >> >> >>> š šreturn None >> >> >>> šif isinstance(data, unicode): >> >> >>> š šreturn unicode(data).encode('utf-8') >> >> >>> šelse: >> >> >>> š šreturn data >> >> >> >>> def sha1_hash(value): >> >> >>> šreturn hashlib.sha1(utf8encoded(value)).hexdigest() >> >> >> >>> On Oct 24, 9:26 pm, Robert Kluin <[email protected]> wrote: >> >> >>>> Hi Dmitry, >> >> >>>> š Glad to hear it was helpful! šNot sure when you checked it out >> >> >>>> last, >> >> >>>> but I made a number of good (I think) improvements in the last couple >> >> >>>> days, such as continuations to allow splitting large groups of work >> >> >>>> up. >> >> >> >>>> Robert >> >> >> >>>> On Sun, Oct 24, 2010 at 07:57, Dmitry <[email protected]> >> >> >>>> wrote: >> >> >>>>> Robert, >> >> >> >>>>> You grouping_with_date_rollup.py example was extremely helpful. >> >> >>>>> Thanks >> >> >>>>> a lot again! :) >> >> >> >>>>> On Oct 14, 8:47 pm, Robert Kluin <[email protected]> wrote: >> >> >>>>>> Hey Carles, >> >> >>>>>> š Glad it seems helpful. šI am hoping to get time today to push out >> >> >>>>>> some revisions and sample code. >> >> >> >>>>>> Robert >> >> >> >>>>>> On Thu, Oct 14, 2010 at 05:50, Carles Gonzalez >> >> >>>>>> <[email protected]> wrote: >> >> >>>>>>> Robert, I took a brief inspection at your code and seems very >> >> >>>>>>> cool. Exactly >> >> >>>>>>> what i was lloking for for my report generation and such. >> >> >>>>>>> I'm looking forward for more examples, but it seems a very >> >> >>>>>>> valuable addition >> >> >>>>>>> for our toolbox. >> >> >>>>>>> Thanks a lot! >> >> >> >>>>>>> On Wed, Oct 13, 2010 at 9:20 PM, Carles Gonzalez >> >> >>>>>>> <[email protected]> wrote: >> >> >> >>>>>>>> Neat! I'm going to see this code, hopefully I'll understand >> >> >>>>>>>> something :) >> >> >>>>>>>> On Wednesday, October 13, 2010, Robert Kluin >> >> >>>>>>>> <[email protected]> >> >> >>>>>>>> wrote: >> >> >>>>>>>>> Hey Dmitry, >> >> >>>>>>>>> š šIn case it might help, I pushed some code to bitbucket. šAt >> >> >>>>>>>>> the >> >> >>>>>>>>> moment I would (personally) say the code is not too pretty, but >> >> >>>>>>>>> it >> >> >>>>>>>>> works well. š:) >> >> >>>>>>>>> š š šhttp://bitbucket.org/thebobert/slagg >> >> >> >>>>>>>>> š Sorry it does not really have good documentation at the >> >> >>>>>>>>> moment, but >> >> >>>>>>>>> I think the basic example I threw together will give you a good >> >> >>>>>>>>> idea >> >> >>>>>>>>> of how to use it. šI need to do another cleanup pass over the >> >> >>>>>>>>> API to >> >> >>>>>>>>> make a few more refinements. >> >> >> >>>>>>>>> š šI pulled this code out of one of my apps, and tried to >> >> >>>>>>>>> quickly >> >> >>>>>>>>> refactor it to be a bit more generic. šWe are currently using >> >> >>>>>>>>> basically the same code in three apps to do some really complex >> >> >>>>>>>>> calculations. šAs soon as I get time I will get an example up >> >> >>>>>>>>> showing >> >> >>>>>>>>> how to use it for neat stuff, like overall, yearly, monthly, >> >> >>>>>>>>> and daily >> >> >>>>>>>>> aggregates across multiple values (like total dollars and >> >> >>>>>>>>> quantity). >> >> >>>>>>>>> The cool thing is that you can do all of those aggregations >> >> >>>>>>>>> across >> >> >>>>>>>>> various groupings, like customer, company, contact, and >> >> >>>>>>>>> sales-person, >> >> >>>>>>>>> at once. šI'll get that code pushed out in the next few days. >> >> >> >>>>>>>>> š Would love to get some feedback on it. >> >> >> >>>>>>>>> Robert >> >> >> >>>>>>>>> On Tue, Oct 12, 2010 at 17:26, Dmitry >> >> >>>>>>>>> <[email protected]> wrote: >> >> >>>>>>>>>> Ben, thanks for your code! I'm trying to understand all this >> >> >>>>>>>>>> stuff >> >> >>>>>>>>>> too... >> >> >>>>>>>>>> Robert, any success with your "library"? May be you've already >> >> >>>>>>>>>> done >> >> >>>>>>>>>> all stuff we are trying to implement... >> >> >> >>>>>>>>>> p.s. where is Brett S.:) would like to hear his comments on >> >> >>>>>>>>>> this >> >> >> >>>>>>>>>> On Sep 21, 1:49 pm, Ben <[email protected]> wrote: >> >> >>>>>>>>>>> Thanks for your insights. I would love feedback on this >> >> >>>>>>>>>>> implementation >> >> >>>>>>>>>>> (Brett S. suggested we send in our code for >> >> >>>>>>>>>>> this)http://pastebin.com/3pUhFdk8 >> >> >> >>>>>>>>>>> This implementation is for just one materialized view row at >> >> >>>>>>>>>>> a time >> >> >>>>>>>>>>> (e.g. a simple counter, no presence markers). Hopefully >> >> >>>>>>>>>>> putting an ETA >> >> >>>>>>>>>>> on the transactional task will relieve the write pressure, >> >> >>>>>>>>>>> since >> >> >>>>>>>>>>> usually it should be an old update with an out-of-date >> >> >>>>>>>>>>> sequence number >> >> >>>>>>>>>>> and be discarded (the update having already been completed in >> >> >>>>>>>>>>> batch by >> >> >>>>>>>>>>> the fork-join-queue). >> >> >> >>>>>>>>>>> I'd love to generalize this to do more than one materialized >> >> >>>>>>>>>>> view row >> >> >>>>>>>>>>> but thought I'd get feedback first. >> >> >> >>>>>>>>>>> Thanks, >> >> >>>>>>>>>>> Ben >> >> >> >>>>>>>>>>> On Sep 17, 7:30 am, Robert Kluin <[email protected]> >> >> >>>>>>>>>>> wrote: >> >> >> >>>>>>>>>>>> Responses inline. >> >> >> >>>>>>>>>>>> On Thu, Sep 16, 2010 at 17:32, Ben >> >> >>>>>>>>>>>> <[email protected]> >> >> >>>>>>>>>>>> wrote: >> >> >>>>>>>>>>>>> I have a question about Brett Slatkin's talk at I/O 2010 on >> >> >>>>>>>>>>>>> data >> >> >>>>>>>>>>>>> pipelines. The question is about slide #67 of his pdf, >> >> >>>>>>>>>>>>> corresponding >> >> >>>>>>>>>>>>> to minute 51:30 of his talk >> >> >> >>>>>>>>>>>>>>http://code.google.com/events/io/2010/sessions/high-throughput-data-p... >> >> >> >>>>>>>>>>>>> I am wondering what is supposed to happen in the >> >> >>>>>>>>>>>>> transactional >> >> >>>>>>>>>>>>> task >> >> >>>>>>>>>>>>> (bullet point 2c). Would these updates to the materialized >> >> >>>>>>>>>>>>> view >> >> >>>>>>>>>>>>> cause >> >> >>>>>>>>>>>>> you to write too frequently to the entity group containing >> >> >>>>>>>>>>>>> the >> >> >>>>>>>>>>>>> materialized view? >> >> >> >>>>>>>>>>>> I think there are really two different approaches you can >> >> >>>>>>>>>>>> use to >> >> >>>>>>>>>>>> insert your work models. >> >> >>>>>>>>>>>> 1) šThe work models get added to the original entity's >> >> >>>>>>>>>>>> group. šSo, >> >> >>>>>>>>>>>> inside of the original transaction you do not write to the >> >> >>>>>>>>>>>> entity >> >> >>>>>>>>>>>> group containing the materialized view -- so no contention >> >> >>>>>>>>>>>> on it. >> >> >>>>>>>>>>>> Commit the transaction and proceed to step 3. >> >> >>>>>>>>>>>> 2) šYou kick off a transactional task to insert the work >> >> >>>>>>>>>>>> model, or >> >> >>>>>>>>>>>> fan-out more tasks to create work models š:). š Then you >> >> >>>>>>>>>>>> proceed to >> >> >>>>>>>>>>>> step 3. >> >> >> >>>>>>>>>>>> You can use method 1 if you have only a few aggregates. šIf >> >> >>>>>>>>>>>> you have >> >> >>>>>>>>>>>> more aggregates use the second method. šI have a "library" I >> >> >>>>>>>>>>>> am >> >> >>>>>>>>>>>> almost >> >> >>>>>>>>>>>> ready to open source that makes method 2 really easy, so you >> >> >>>>>>>>>>>> can >> >> >>>>>>>>>>>> have >> >> >>>>>>>>>>>> lots of aggregates. šI'll post to this group when I release >> >> >>>>>>>>>>>> it. >> >> >> >>>>>>>>>>>>> And a related question, what happens if there is a failure >> >> >>>>>>>>>>>>> just >> >> >>>>>>>>>>>>> after >> >> >>>>>>>>>>>>> the transaction in bullet #2, but right before the named >> >> >>>>>>>>>>>>> task gets >> >> >>>>>>>>>>>>> inserted in bullet #3. In my current implementation I just >> >> >>>>>>>>>>>>> left >> >> >>>>>>>>>>>>> out >> >> >>>>>>>>>>>>> the >> >> ... >> >> read more » > > -- > You received this message because you are subscribed to the Google Groups > "Google App Engine" group. > To post to this group, send email to [email protected]. > To unsubscribe from this group, send email to > [email protected]. > For more options, visit this group at > http://groups.google.com/group/google-appengine?hl=en. > > -- You received this message because you are subscribed to the Google Groups "Google App Engine" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.
