may be because there were active task during version deployment... Trying to investigate this
On 19 ноя, 17:53, Dmitry <[email protected]> wrote: > Robert, > > finally I moved to tasks... Without tasks I get a lot of missed items > in AggWork table. > > But even with tasks - sometimes it stucks: > > Traceback (most recent call last): > File "/base/python_runtime/python_lib/versions/1/google/appengine/ > ext/webapp/__init__.py", line 513, in __call__ > handler.post(*groups) > File "/base/data/home/apps/app/1.346324028841294499/slagg/ > __init__.py", line 584, in post > self.batch = lock.get_write_lock() > File "/base/data/home/apps/app/1.346324028841294499/slagg/ > __init__.py", line 144, in get_write_lock > raise WriteLockFailedError > WriteLockFailedError > > And cannot process some items in AggWork table > > On 10 ноя, 05:43, Robert Kluin <[email protected]> wrote: > > > Dmitry, > > Glad to hear the bucket size helped! > > > Please let me know how it goes. If you have good results, maybe we > > can find a clean way to facilitate directly doing the work done by > > create work. > > > Robert > > > On Tue, Nov 9, 2010 at 18:11, Dmitry <[email protected]> wrote: > > > Robert, thanks a lot for your sugestions! > > > Increasing bucket size made a huge difference. Need to study > > > theoretical part... and find the optimal bucket size for 50/sec. > > > > yep, I use creatework directly without fanout. I will try to insert > > > 'work' models within my original data transaction and compare the > > > performance. > > > > On Nov 9, 3:14 am, Robert Kluin <[email protected]> wrote: > > >> Hey Dmitry, > > >> I am working on getting some decent documentation about when you > > >> might want to use fanout versus directly using creatwork. And, about > > >> usage in general. If I am dealing with one or two aggregations I > > >> usually use creatework directly. You can only insert five > > >> transactional tasks in one database transaction, so with four you > > >> could directly use creatework eliminating a fanout task. > > > >> As far as rates go, I have been using a rate of 35/s and bucket size > > >> of 40. However, I also get periodic queue backups. I think the max > > >> rate / sec is currently 50, but I thought there was an announcement it > > >> was getting increased (maybe I am just remembering the increase to > > >> 50/sec announcement though). You might want to bump your rate up to > > >> 50/sec. I always use a dedicated queue for creatework and aggregation > > >> tasks. In one of my apps I use multiple queues to get a bit higher > > >> throughput. > > > >> I generally prefer to use creatework tasks; they cleanly handle any > > >> failures that occur and keeps my primary processing running as fast as > > >> possible. However, when I first started using this type of > > >> aggregation technique I created the 'work' models and attempted to > > >> insert the aggregator task (non-transactionaly!) within my primary > > >> transaction. If your primary processing is within tasks, and your > > >> tasks are fast enough, give it a shot. Converting CreateWorkHandler > > >> to something you can use directly should not be a big deal. > > > >> Robert > > > >> On Mon, Nov 8, 2010 at 18:14, Dmitry <[email protected]> wrote: > > >> > Hi Robert, > > > >> > What queue configuration do you use for your system? > > >> > I came to another problem. I usually process several feeds in parallel > > >> > and can insert up to 20-30 new items to the database. With 4 > > >> > aggregators it's >80 create_work tasks in one moment. So after a > > >> > minute I can have up to 1000 tasks in queue... so I have up to 5 > > >> > minutes delay in processing. > > > >> > It seems that for initial aggregation I should insert create work > > >> > models not in tasks. > > >> > I messed up again:) > > > >> > On Nov 5, 6:46 am, Robert Kluin <[email protected]> wrote: > > >> >> Dmitry, > > >> >> I finally got the time to make these changes. Let me know if that > > >> >> works for your use-case. > > > >> >> I really appreciate all of your suggestions and help with this. > > > >> >> Robert > > > >> >> 2010/11/3 Dmitry <[email protected]>: > > > >> >> > oops I read expression in wrong direction. This will definitely > > >> >> > work! > > > >> >> > On Nov 3, 7:43 pm, Robert Kluin <[email protected]> wrote: > > >> >> >> Dmitry, > > >> >> >> š Right, I know those will cause problems. So what about my > > >> >> >> suggested solution of using: > > > >> >> >> šif not re.match("^[a-zA-Z0-9-]+$", task_name): > > >> >> >> š š š task_name = šsha1_hash(task_name) > > > >> >> >> That should correctly handle your use cases, since the full name > > >> >> >> will be hashed. > > > >> >> >> Are there issues with that solution I am not seeing? > > > >> >> >> Robert > > > >> >> >> On Nov 3, 2010, at 3:52, Dmitry <[email protected]> wrote: > > > >> >> >> > Robert, > > > >> >> >> > You will get into the trouble with these aggregations: > > > >> >> >> > urls: > > >> >> >> > http://ÐÒÁ×ÉÔÅÌØÓÔ×Ï.ÒÆ/search/?phrase=ÎÁÌÏǧion=gov_events > > >> >> >> > -> > > >> >> >> > httpsearchphrase > > >> >> >> > http://ÐÒÁ×ÉÔÅÌØÓÔ×Ï.ÒÆ/search/?phrase=ÐÒÅÚÉÄÅÎÔ§ion=gov_events > > >> >> >> > -> > > >> >> >> > httpsearchphrase > > > >> >> >> > or usernames: > > >> >> >> > ÍÓÔÉÔÅÌØ2000 -> 2000 > > >> >> >> > ÔÅÓÔ2000 -> 2000 > > > >> >> >> > but anyway in most cases your approach will work well:) You can > > >> >> >> > leave > > >> >> >> > it up to the user (add some kind of flag "use_hash"). > > > >> >> >> > or we can try to url encode strings: > > >> >> >> > urllib.quote(task_name.encode('utf-8')) > > >> >> >> > http3AD0BFD180D0B0D0B2D0B8D182D0B5D0BBD18CD181D182D0B2D0BED180D184search3Fphrase3DD0BDD0B0D0BBD0BED0B3 > > >> >> >> > http3AD0BFD180D0B0D0B2D0B8D182D0B5D0BBD18CD181D182D0B2D0BED180D184search3Fphrase3DD0BFD180D0B5D0B7D0B8D0B4D0B5D0BDD182 > > > >> >> >> > but this is not better that hash :-D > > > >> >> >> > thanks > > > >> >> >> > On Nov 3, 7:13 am, Robert Kluin <[email protected]> wrote: > > >> >> >> >> Hey Dmitry, > > >> >> >> >> š I am sure the "fix" in that commit is _not_ a good idea. > > >> >> >> >> šOriginally > > >> >> >> >> I stuck it in because I use entity keys as the task-name, > > >> >> >> >> sometimes > > >> >> >> >> they contains characters not allowed in task-names. šI actually > > >> >> >> >> debated for several days about pushing that update out; > > >> >> >> >> šfinally I > > >> >> >> >> decide to push and hope someone would notice and offer their > > >> >> >> >> thoughts. > > > >> >> >> >> š I like your idea a lot. šBut, for many aggregations I like to > > >> >> >> >> use > > >> >> >> >> entity keys, it makes it possible for me to visually see what a > > >> >> >> >> task > > >> >> >> >> is doing. šWhat do you think about something like the following > > >> >> >> >> approach: > > > >> >> >> >> š if not re.match("^[a-zA-Z0-9-]+$", task_name): > > >> >> >> >> š š š task_name = sha1_hash(task_name) > > > >> >> >> >> That should allow 'valid' names to remain as-is, but it will > > >> >> >> >> safely > > >> >> >> >> encode non-valid task-names. šDo you think that is an acceptable > > >> >> >> >> method? > > > >> >> >> >> Thanks a lot for your feedback. > > > >> >> >> >> Robert > > > >> >> >> >> On Tue, Nov 2, 2010 at 07:15, Dmitry > > >> >> >> >> <[email protected]> wrote: > > >> >> >> >>> Hi Robert, > > > >> >> >> >>> Regarding your latest commit: > > > >> >> >> >>> # TODO: find a better solution for cleaning up the name. > > >> >> >> >>> task_name = re.sub('[^a-zA-Z0-9-]', '', task_name)[:500] > > > >> >> >> >>> Don't think this is a good idea:) For example I have unicode > > >> >> >> >>> characters in aggregation value. In this case regexp will > > >> >> >> >>> return > > >> >> >> >>> nothing. > > >> >> >> >>> I use sha1 hash now... but there's also a little possibility of > > >> >> >> >>> collision > > > >> >> >> >>> sha1_hash(self.agg_name) > > > >> >> >> >>> def utf8encoded(data): > > >> >> >> >>> šif data is None: > > >> >> >> >>> š šreturn None > > >> >> >> >>> šif isinstance(data, unicode): > > >> >> >> >>> š šreturn unicode(data).encode('utf-8') > > >> >> >> >>> šelse: > > >> >> >> >>> š šreturn data > > > >> >> >> >>> def sha1_hash(value): > > >> >> >> >>> šreturn hashlib.sha1(utf8encoded(value)).hexdigest() > > > >> >> >> >>> On Oct 24, 9:26 pm, Robert Kluin <[email protected]> > > >> >> >> >>> wrote: > > >> >> >> >>>> Hi Dmitry, > > >> >> >> >>>> š Glad to hear it was helpful! šNot sure when you checked it > > >> >> >> >>>> out last, > > >> >> >> >>>> but I made a number of good (I think) improvements in the > > >> >> >> >>>> last couple > > >> >> >> >>>> days, such as continuations to allow splitting large groups > > >> >> >> >>>> of work > > >> >> >> >>>> up. > > > >> >> >> >>>> Robert > > > >> >> >> >>>> On Sun, Oct 24, 2010 at 07:57, Dmitry > > >> >> >> >>>> <[email protected]> wrote: > > >> >> >> >>>>> Robert, > > > >> >> >> >>>>> You grouping_with_date_rollup.py example was extremely > > >> >> >> >>>>> helpful. Thanks > > >> >> >> >>>>> a lot again! :) > > > >> >> >> >>>>> On Oct 14, 8:47 pm, Robert Kluin <[email protected]> > > >> >> >> >>>>> wrote: > > >> >> >> >>>>>> Hey Carles, > > >> >> >> >>>>>> š Glad it seems helpful. šI am hoping to get time today to > > >> >> >> >>>>>> push out > > >> >> >> >>>>>> some revisions and sample code. > > > >> >> >> >>>>>> Robert > > > >> >> >> >>>>>> On Thu, Oct 14, 2010 at 05:50, Carles Gonzalez > > >> >> >> >>>>>> <[email protected]> wrote: > > >> >> >> >>>>>>> Robert, I took a brief inspection at your code and seems > > >> >> >> >>>>>>> very cool. Exactly > > >> >> >> >>>>>>> what i was lloking for for my report generation and such. > > >> >> >> >>>>>>> I'm looking forward for more examples, but it seems a very > > >> >> >> >>>>>>> valuable addition > > >> >> >> >>>>>>> for our toolbox. > > >> >> >> >>>>>>> Thanks a lot! > > > >> >> >> >>>>>>> On Wed, Oct 13, 2010 at 9:20 PM, Carles Gonzalez > > >> >> >> >>>>>>> <[email protected]> wrote: > > > >> >> >> >>>>>>>> Neat! I'm going to see this code, hopefully I'll > > >> >> >> >>>>>>>> understand something :) > > >> >> >> >>>>>>>> On Wednesday, October 13, 2010, Robert Kluin > > >> >> >> >>>>>>>> <[email protected]> > > >> >> >> >>>>>>>> wrote: > > >> >> >> >>>>>>>>> Hey Dmitry, > > >> >> >> >>>>>>>>> š šIn case it might help, I pushed some code to > > >> >> >> >>>>>>>>> bitbucket. šAt the > > >> >> >> >>>>>>>>> moment I would (personally) say the code is not too > > >> >> >> >>>>>>>>> pretty, but it > > ... > > продолжение » -- You received this message because you are subscribed to the Google Groups "Google App Engine" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.
