Hi Robert, What queue configuration do you use for your system? I came to another problem. I usually process several feeds in parallel and can insert up to 20-30 new items to the database. With 4 aggregators it's >80 create_work tasks in one moment. So after a minute I can have up to 1000 tasks in queue... so I have up to 5 minutes delay in processing.
It seems that for initial aggregation I should insert create work models not in tasks. I messed up again:) On Nov 5, 6:46 am, Robert Kluin <[email protected]> wrote: > Dmitry, > I finally got the time to make these changes. Let me know if that > works for your use-case. > > I really appreciate all of your suggestions and help with this. > > Robert > > 2010/11/3 Dmitry <[email protected]>: > > > oops I read expression in wrong direction. This will definitely work! > > > On Nov 3, 7:43 pm, Robert Kluin <[email protected]> wrote: > >> Dmitry, > >> š Right, I know those will cause problems. So what about my suggested > >> solution of using: > > >> šif not re.match("^[a-zA-Z0-9-]+$", task_name): > >> š š š task_name = šsha1_hash(task_name) > > >> That should correctly handle your use cases, since the full name will be > >> hashed. > > >> Are there issues with that solution I am not seeing? > > >> Robert > > >> On Nov 3, 2010, at 3:52, Dmitry <[email protected]> wrote: > > >> > Robert, > > >> > You will get into the trouble with these aggregations: > > >> > urls: > >> > http://ÐÒÁ×ÉÔÅÌØÓÔ×Ï.ÒÆ/search/?phrase=ÎÁÌÏǧion=gov_events -> > >> > httpsearchphrase > >> > http://ÐÒÁ×ÉÔÅÌØÓÔ×Ï.ÒÆ/search/?phrase=ÐÒÅÚÉÄÅÎÔ§ion=gov_events -> > >> > httpsearchphrase > > >> > or usernames: > >> > ÍÓÔÉÔÅÌØ2000 -> 2000 > >> > ÔÅÓÔ2000 -> 2000 > > >> > but anyway in most cases your approach will work well:) You can leave > >> > it up to the user (add some kind of flag "use_hash"). > > >> > or we can try to url encode strings: > >> > urllib.quote(task_name.encode('utf-8')) > >> > http3AD0BFD180D0B0D0B2D0B8D182D0B5D0BBD18CD181D182D0B2D0BED180D184search3Fphrase3DD0BDD0B0D0BBD0BED0B3 > >> > http3AD0BFD180D0B0D0B2D0B8D182D0B5D0BBD18CD181D182D0B2D0BED180D184search3Fphrase3DD0BFD180D0B5D0B7D0B8D0B4D0B5D0BDD182 > > >> > but this is not better that hash :-D > > >> > thanks > > >> > On Nov 3, 7:13 am, Robert Kluin <[email protected]> wrote: > >> >> Hey Dmitry, > >> >> š I am sure the "fix" in that commit is _not_ a good idea. šOriginally > >> >> I stuck it in because I use entity keys as the task-name, sometimes > >> >> they contains characters not allowed in task-names. šI actually > >> >> debated for several days about pushing that update out; šfinally I > >> >> decide to push and hope someone would notice and offer their thoughts. > > >> >> š I like your idea a lot. šBut, for many aggregations I like to use > >> >> entity keys, it makes it possible for me to visually see what a task > >> >> is doing. šWhat do you think about something like the following > >> >> approach: > > >> >> š if not re.match("^[a-zA-Z0-9-]+$", task_name): > >> >> š š š task_name = sha1_hash(task_name) > > >> >> That should allow 'valid' names to remain as-is, but it will safely > >> >> encode non-valid task-names. šDo you think that is an acceptable > >> >> method? > > >> >> Thanks a lot for your feedback. > > >> >> Robert > > >> >> On Tue, Nov 2, 2010 at 07:15, Dmitry <[email protected]> wrote: > >> >>> Hi Robert, > > >> >>> Regarding your latest commit: > > >> >>> # TODO: find a better solution for cleaning up the name. > >> >>> task_name = re.sub('[^a-zA-Z0-9-]', '', task_name)[:500] > > >> >>> Don't think this is a good idea:) For example I have unicode > >> >>> characters in aggregation value. In this case regexp will return > >> >>> nothing. > >> >>> I use sha1 hash now... but there's also a little possibility of > >> >>> collision > > >> >>> sha1_hash(self.agg_name) > > >> >>> def utf8encoded(data): > >> >>> šif data is None: > >> >>> š šreturn None > >> >>> šif isinstance(data, unicode): > >> >>> š šreturn unicode(data).encode('utf-8') > >> >>> šelse: > >> >>> š šreturn data > > >> >>> def sha1_hash(value): > >> >>> šreturn hashlib.sha1(utf8encoded(value)).hexdigest() > > >> >>> On Oct 24, 9:26 pm, Robert Kluin <[email protected]> wrote: > >> >>>> Hi Dmitry, > >> >>>> š Glad to hear it was helpful! šNot sure when you checked it out last, > >> >>>> but I made a number of good (I think) improvements in the last couple > >> >>>> days, such as continuations to allow splitting large groups of work > >> >>>> up. > > >> >>>> Robert > > >> >>>> On Sun, Oct 24, 2010 at 07:57, Dmitry <[email protected]> > >> >>>> wrote: > >> >>>>> Robert, > > >> >>>>> You grouping_with_date_rollup.py example was extremely helpful. > >> >>>>> Thanks > >> >>>>> a lot again! :) > > >> >>>>> On Oct 14, 8:47 pm, Robert Kluin <[email protected]> wrote: > >> >>>>>> Hey Carles, > >> >>>>>> š Glad it seems helpful. šI am hoping to get time today to push out > >> >>>>>> some revisions and sample code. > > >> >>>>>> Robert > > >> >>>>>> On Thu, Oct 14, 2010 at 05:50, Carles Gonzalez <[email protected]> > >> >>>>>> wrote: > >> >>>>>>> Robert, I took a brief inspection at your code and seems very > >> >>>>>>> cool. Exactly > >> >>>>>>> what i was lloking for for my report generation and such. > >> >>>>>>> I'm looking forward for more examples, but it seems a very > >> >>>>>>> valuable addition > >> >>>>>>> for our toolbox. > >> >>>>>>> Thanks a lot! > > >> >>>>>>> On Wed, Oct 13, 2010 at 9:20 PM, Carles Gonzalez > >> >>>>>>> <[email protected]> wrote: > > >> >>>>>>>> Neat! I'm going to see this code, hopefully I'll understand > >> >>>>>>>> something :) > >> >>>>>>>> On Wednesday, October 13, 2010, Robert Kluin > >> >>>>>>>> <[email protected]> > >> >>>>>>>> wrote: > >> >>>>>>>>> Hey Dmitry, > >> >>>>>>>>> š šIn case it might help, I pushed some code to bitbucket. šAt > >> >>>>>>>>> the > >> >>>>>>>>> moment I would (personally) say the code is not too pretty, but > >> >>>>>>>>> it > >> >>>>>>>>> works well. š:) > >> >>>>>>>>> š š šhttp://bitbucket.org/thebobert/slagg > > >> >>>>>>>>> š Sorry it does not really have good documentation at the > >> >>>>>>>>> moment, but > >> >>>>>>>>> I think the basic example I threw together will give you a good > >> >>>>>>>>> idea > >> >>>>>>>>> of how to use it. šI need to do another cleanup pass over the > >> >>>>>>>>> API to > >> >>>>>>>>> make a few more refinements. > > >> >>>>>>>>> š šI pulled this code out of one of my apps, and tried to quickly > >> >>>>>>>>> refactor it to be a bit more generic. šWe are currently using > >> >>>>>>>>> basically the same code in three apps to do some really complex > >> >>>>>>>>> calculations. šAs soon as I get time I will get an example up > >> >>>>>>>>> showing > >> >>>>>>>>> how to use it for neat stuff, like overall, yearly, monthly, and > >> >>>>>>>>> daily > >> >>>>>>>>> aggregates across multiple values (like total dollars and > >> >>>>>>>>> quantity). > >> >>>>>>>>> The cool thing is that you can do all of those aggregations > >> >>>>>>>>> across > >> >>>>>>>>> various groupings, like customer, company, contact, and > >> >>>>>>>>> sales-person, > >> >>>>>>>>> at once. šI'll get that code pushed out in the next few days. > > >> >>>>>>>>> š Would love to get some feedback on it. > > >> >>>>>>>>> Robert > > >> >>>>>>>>> On Tue, Oct 12, 2010 at 17:26, Dmitry > >> >>>>>>>>> <[email protected]> wrote: > >> >>>>>>>>>> Ben, thanks for your code! I'm trying to understand all this > >> >>>>>>>>>> stuff > >> >>>>>>>>>> too... > >> >>>>>>>>>> Robert, any success with your "library"? May be you've already > >> >>>>>>>>>> done > >> >>>>>>>>>> all stuff we are trying to implement... > > >> >>>>>>>>>> p.s. where is Brett S.:) would like to hear his comments on this > > >> >>>>>>>>>> On Sep 21, 1:49 pm, Ben <[email protected]> wrote: > >> >>>>>>>>>>> Thanks for your insights. I would love feedback on this > >> >>>>>>>>>>> implementation > >> >>>>>>>>>>> (Brett S. suggested we send in our code for > >> >>>>>>>>>>> this)http://pastebin.com/3pUhFdk8 > > >> >>>>>>>>>>> This implementation is for just one materialized view row at a > >> >>>>>>>>>>> time > >> >>>>>>>>>>> (e.g. a simple counter, no presence markers). Hopefully > >> >>>>>>>>>>> putting an ETA > >> >>>>>>>>>>> on the transactional task will relieve the write pressure, > >> >>>>>>>>>>> since > >> >>>>>>>>>>> usually it should be an old update with an out-of-date > >> >>>>>>>>>>> sequence number > >> >>>>>>>>>>> and be discarded (the update having already been completed in > >> >>>>>>>>>>> batch by > >> >>>>>>>>>>> the fork-join-queue). > > >> >>>>>>>>>>> I'd love to generalize this to do more than one materialized > >> >>>>>>>>>>> view row > >> >>>>>>>>>>> but thought I'd get feedback first. > > >> >>>>>>>>>>> Thanks, > >> >>>>>>>>>>> Ben > > >> >>>>>>>>>>> On Sep 17, 7:30 am, Robert Kluin <[email protected]> > >> >>>>>>>>>>> wrote: > > >> >>>>>>>>>>>> Responses inline. > > >> >>>>>>>>>>>> On Thu, Sep 16, 2010 at 17:32, Ben > >> >>>>>>>>>>>> <[email protected]> > >> >>>>>>>>>>>> wrote: > >> >>>>>>>>>>>>> I have a question about Brett Slatkin's talk at I/O 2010 on > >> >>>>>>>>>>>>> data > >> >>>>>>>>>>>>> pipelines. The question is about slide #67 of his pdf, > >> >>>>>>>>>>>>> corresponding > >> >>>>>>>>>>>>> to minute 51:30 of his talk > > >> >>>>>>>>>>>>>>http://code.google.com/events/io/2010/sessions/high-throughput-data-p... > > >> >>>>>>>>>>>>> I am wondering what is supposed to happen in the > >> >>>>>>>>>>>>> transactional > >> >>>>>>>>>>>>> task > >> >>>>>>>>>>>>> (bullet point 2c). Would these updates to the materialized > >> >>>>>>>>>>>>> view > >> >>>>>>>>>>>>> cause > >> >>>>>>>>>>>>> you to write too frequently to the entity group containing > >> >>>>>>>>>>>>> the > >> >>>>>>>>>>>>> materialized view? > > >> >>>>>>>>>>>> I think there are really two different approaches you can use > >> >>>>>>>>>>>> to > >> >>>>>>>>>>>> insert your work models. > >> >>>>>>>>>>>> 1) šThe work models get added to the original entity's group. > >> >>>>>>>>>>>> šSo, > >> >>>>>>>>>>>> inside of the original transaction you do not write to the > >> >>>>>>>>>>>> entity > >> >>>>>>>>>>>> group containing the materialized view -- so no contention on > >> >>>>>>>>>>>> it. > >> >>>>>>>>>>>> Commit the transaction and proceed to step 3. > >> >>>>>>>>>>>> 2) šYou kick off a transactional task to insert the work > >> >>>>>>>>>>>> model, or > >> >>>>>>>>>>>> fan-out more tasks to create work models š:). š Then you > >> >>>>>>>>>>>> proceed to > >> >>>>>>>>>>>> step 3. > > >> >>>>>>>>>>>> You can use method 1 if you have only a few aggregates. šIf > >> >>>>>>>>>>>> you have > >> >>>>>>>>>>>> more aggregates use the second method. šI have a "library" I > >> >>>>>>>>>>>> am > >> >>>>>>>>>>>> almost > >> >>>>>>>>>>>> ready to open source that makes method 2 really easy, so you > >> >>>>>>>>>>>> can > >> >>>>>>>>>>>> have > >> >>>>>>>>>>>> lots of aggregates. šI'll post to this group when I release > >> >>>>>>>>>>>> it. > > >> >>>>>>>>>>>>> And a related question, what happens if there is a failure > >> >>>>>>>>>>>>> just > >> >>>>>>>>>>>>> after > >> >>>>>>>>>>>>> the transaction in bullet #2, but right before the named > >> >>>>>>>>>>>>> task gets > >> >>>>>>>>>>>>> inserted in bullet #3. In my current implementation I just > >> >>>>>>>>>>>>> left > >> >>>>>>>>>>>>> out > >> >>>>>>>>>>>>> the > > ... > > read more » -- You received this message because you are subscribed to the Google Groups "Google App Engine" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.
