Re: [google-appengine] Re: Fan-in with materialized views: A sketch

Robert Kluin Fri, 19 Nov 2010 09:23:43 -0800

Hi Dmitry,
  It sounds like there may be some "contention" while trying to get a
write lock, perhaps increasing get_write_lock's tries parameter to 3
or 4 would help;  although the entire task will just retry if it
fails.  It sounds like get_read_lock needs to wait longer before
continuing to process, the (max) wait is controlled with the parameter
max_wait.  In your case it sounds like it might help to increase both.
 You might also want to play with CreateWork's delay attribute to
adjust the batch sizes.


  In my applications and in live testing, I have not encountered these
issues.  Could you share some of the details about your use case with
me?  Specifically, I am interested in how "big" (I mean write-time
wise) your initial writes are, and how many entities your seeing
batched in the aggregations.  Are you using the CreateWorkHandler
create_work class method to insert the CreateWork tasks?  Feel free to
contact me off-group to discuss those details.

  Are you getting items that were not processed or items that were
processed, but not deleted left in AggWork?

  Very soon I will make some adjustments and include a
"cleanup/resume" handler to process AggWork and catch any missed
items.  I'm hoping to get that implemented in the next couple weeks.
I am also hoping to get some good documentation online too, if you
have thoughts about important areas to focus on let me know.

  Deploying with active tasks should not cause any issues that I know
of.  But, sometimes I see 500 errors while my default version is
switching over.  Can you tell if your tasks are actually being run or
if they are just silently being dropped (by the app engine backend)?


Robert





On Fri, Nov 19, 2010 at 09:53, Dmitry <[email protected]> wrote:
> Robert,
>
> finally I moved to tasks... Without tasks I get a lot of missed items
> in AggWork table.
>
> But even with tasks - sometimes it stucks:
>
> Traceback (most recent call last):
>  File "/base/python_runtime/python_lib/versions/1/google/appengine/
> ext/webapp/__init__.py", line 513, in __call__
>    handler.post(*groups)
>  File "/base/data/home/apps/app/1.346324028841294499/slagg/
> __init__.py", line 584, in post
>    self.batch = lock.get_write_lock()
>  File "/base/data/home/apps/app/1.346324028841294499/slagg/
> __init__.py", line 144, in get_write_lock
>    raise WriteLockFailedError
> WriteLockFailedError
>
> And cannot process some items in AggWork table
>
> On 10 ноя, 05:43, Robert Kluin <[email protected]> wrote:
>> Dmitry,
>>   Glad to hear the bucket size helped!
>>
>>   Please let me know how it goes.  If you have good results, maybe we
>> can find a clean way to facilitate directly doing the work done by
>> create work.
>>
>> Robert
>>
>> On Tue, Nov 9, 2010 at 18:11, Dmitry <[email protected]> wrote:
>> > Robert, thanks a lot for your sugestions!
>> > Increasing bucket size made a huge difference. Need to study
>> > theoretical part... and find the optimal bucket size for 50/sec.
>>
>> > yep, I use creatework directly without fanout. I will try to insert
>> > 'work' models within my original data transaction and compare the
>> > performance.
>>
>> > On Nov 9, 3:14 am, Robert Kluin <[email protected]> wrote:
>> >> Hey Dmitry,
>> >>    I am working on getting some decent documentation about when you
>> >> might want to use fanout versus directly using creatwork.  And, about
>> >> usage in general.  If I am dealing with one or two aggregations I
>> >> usually use creatework directly.  You can only insert five
>> >> transactional tasks in one database transaction, so with four you
>> >> could directly use creatework eliminating a fanout task.
>>
>> >>   As far as rates go, I have been using a rate of 35/s and bucket size
>> >> of 40.  However, I also get periodic queue backups.  I think the max
>> >> rate / sec is currently 50, but I thought there was an announcement it
>> >> was getting increased (maybe I am just remembering the increase to
>> >> 50/sec announcement though).  You might want to bump your rate up to
>> >> 50/sec.  I always use a dedicated queue for creatework and aggregation
>> >> tasks.  In one of my apps I use multiple queues to get a bit higher
>> >> throughput.
>>
>> >>   I generally prefer to use creatework tasks; they cleanly handle any
>> >> failures that occur and keeps my primary processing running as fast as
>> >> possible.  However, when I first started using this type of
>> >> aggregation technique I created the 'work' models and attempted to
>> >> insert the aggregator task (non-transactionaly!) within my primary
>> >> transaction.  If your primary processing is within tasks, and your
>> >> tasks are fast enough, give it a shot.  Converting CreateWorkHandler
>> >> to something you can use directly should not be a big deal.
>>
>> >> Robert
>>
>> >> On Mon, Nov 8, 2010 at 18:14, Dmitry <[email protected]> wrote:
>> >> > Hi Robert,
>>
>> >> > What queue configuration do you use for your system?
>> >> > I came to another problem. I usually process several feeds in parallel
>> >> > and can insert up to 20-30 new items to the database. With 4
>> >> > aggregators it's >80 create_work tasks in one moment. So after a
>> >> > minute I can have up to 1000 tasks in queue... so I have up to 5
>> >> > minutes delay in processing.
>>
>> >> > It seems that for initial aggregation I should insert create work
>> >> > models not in tasks.
>> >> > I messed up again:)
>>
>> >> > On Nov 5, 6:46 am, Robert Kluin <[email protected]> wrote:
>> >> >> Dmitry,
>> >> >>    I finally got the time to make these changes.  Let me know if that
>> >> >> works for your use-case.
>>
>> >> >>    I really appreciate all of your suggestions and help with this.
>>
>> >> >> Robert
>>
>> >> >> 2010/11/3 Dmitry <[email protected]>:
>>
>> >> >> > oops I read expression in wrong direction. This will definitely work!
>>
>> >> >> > On Nov 3, 7:43 pm, Robert Kluin <[email protected]> wrote:
>> >> >> >> Dmitry,
>> >> >> >> š Right, I know those will cause problems. So what about my 
>> >> >> >> suggested solution of using:
>>
>> >> >> >> šif not re.match("^[a-zA-Z0-9-]+$", task_name):
>> >> >> >> š š š task_name = šsha1_hash(task_name)
>>
>> >> >> >> That should correctly handle your use cases, since the full name 
>> >> >> >> will be hashed.
>>
>> >> >> >> Are there issues with that solution I am not seeing?
>>
>> >> >> >> Robert
>>
>> >> >> >> On Nov 3, 2010, at 3:52, Dmitry <[email protected]> wrote:
>>
>> >> >> >> > Robert,
>>
>> >> >> >> > You will get into the trouble with these aggregations:
>>
>> >> >> >> > urls:
>> >> >> >> > http://ÐÒÁ×ÉÔÅÌØÓÔ×Ï.ÒÆ/search/?phrase=ÎÁÌÏÇ&section=gov_events ->
>> >> >> >> > httpsearchphrase
>> >> >> >> > http://ÐÒÁ×ÉÔÅÌØÓÔ×Ï.ÒÆ/search/?phrase=ÐÒÅÚÉÄÅÎÔ&section=gov_events
>> >> >> >> >  ->
>> >> >> >> > httpsearchphrase
>>
>> >> >> >> > or usernames:
>> >> >> >> > ÍÓÔÉÔÅÌØ2000 -> 2000
>> >> >> >> > ÔÅÓÔ2000 -> 2000
>>
>> >> >> >> > but anyway in most cases your approach will work well:) You can 
>> >> >> >> > leave
>> >> >> >> > it up to the user (add some kind of flag "use_hash").
>>
>> >> >> >> > or we can try to url encode strings:
>> >> >> >> > urllib.quote(task_name.encode('utf-8'))
>> >> >> >> > http3AD0BFD180D0B0D0B2D0B8D182D0B5D0BBD18CD181D182D0B2D0BED180D184search3Fphrase3DD0BDD0B0D0BBD0BED0B3
>> >> >> >> > http3AD0BFD180D0B0D0B2D0B8D182D0B5D0BBD18CD181D182D0B2D0BED180D184search3Fphrase3DD0BFD180D0B5D0B7D0B8D0B4D0B5D0BDD182
>>
>> >> >> >> > but this is not better that hash :-D
>>
>> >> >> >> > thanks
>>
>> >> >> >> > On Nov 3, 7:13 am, Robert Kluin <[email protected]> wrote:
>> >> >> >> >> Hey Dmitry,
>> >> >> >> >> š I am sure the "fix" in that commit is _not_ a good idea. 
>> >> >> >> >> šOriginally
>> >> >> >> >> I stuck it in because I use entity keys as the task-name, 
>> >> >> >> >> sometimes
>> >> >> >> >> they contains characters not allowed in task-names. šI actually
>> >> >> >> >> debated for several days about pushing that update out; šfinally 
>> >> >> >> >> I
>> >> >> >> >> decide to push and hope someone would notice and offer their 
>> >> >> >> >> thoughts.
>>
>> >> >> >> >> š I like your idea a lot. šBut, for many aggregations I like to 
>> >> >> >> >> use
>> >> >> >> >> entity keys, it makes it possible for me to visually see what a 
>> >> >> >> >> task
>> >> >> >> >> is doing. šWhat do you think about something like the following
>> >> >> >> >> approach:
>>
>> >> >> >> >> š if not re.match("^[a-zA-Z0-9-]+$", task_name):
>> >> >> >> >> š š š task_name = sha1_hash(task_name)
>>
>> >> >> >> >> That should allow 'valid' names to remain as-is, but it will 
>> >> >> >> >> safely
>> >> >> >> >> encode non-valid task-names. šDo you think that is an acceptable
>> >> >> >> >> method?
>>
>> >> >> >> >> Thanks a lot for your feedback.
>>
>> >> >> >> >> Robert
>>
>> >> >> >> >> On Tue, Nov 2, 2010 at 07:15, Dmitry <[email protected]> 
>> >> >> >> >> wrote:
>> >> >> >> >>> Hi Robert,
>>
>> >> >> >> >>> Regarding your latest commit:
>>
>> >> >> >> >>> # TODO: find a better solution for cleaning up the name.
>> >> >> >> >>> task_name = re.sub('[^a-zA-Z0-9-]', '', task_name)[:500]
>>
>> >> >> >> >>> Don't think this is a good idea:) For example I have unicode
>> >> >> >> >>> characters in aggregation value. In this case regexp will return
>> >> >> >> >>> nothing.
>> >> >> >> >>> I use sha1 hash now... but there's also a little possibility of
>> >> >> >> >>> collision
>>
>> >> >> >> >>> sha1_hash(self.agg_name)
>>
>> >> >> >> >>> def utf8encoded(data):
>> >> >> >> >>> šif data is None:
>> >> >> >> >>> š šreturn None
>> >> >> >> >>> šif isinstance(data, unicode):
>> >> >> >> >>> š šreturn unicode(data).encode('utf-8')
>> >> >> >> >>> šelse:
>> >> >> >> >>> š šreturn data
>>
>> >> >> >> >>> def sha1_hash(value):
>> >> >> >> >>> šreturn hashlib.sha1(utf8encoded(value)).hexdigest()
>>
>> >> >> >> >>> On Oct 24, 9:26 pm, Robert Kluin <[email protected]> wrote:
>> >> >> >> >>>> Hi Dmitry,
>> >> >> >> >>>> š Glad to hear it was helpful! šNot sure when you checked it 
>> >> >> >> >>>> out last,
>> >> >> >> >>>> but I made a number of good (I think) improvements in the last 
>> >> >> >> >>>> couple
>> >> >> >> >>>> days, such as continuations to allow splitting large groups of 
>> >> >> >> >>>> work
>> >> >> >> >>>> up.
>>
>> >> >> >> >>>> Robert
>>
>> >> >> >> >>>> On Sun, Oct 24, 2010 at 07:57, Dmitry 
>> >> >> >> >>>> <[email protected]> wrote:
>> >> >> >> >>>>> Robert,
>>
>> >> >> >> >>>>> You grouping_with_date_rollup.py example was extremely 
>> >> >> >> >>>>> helpful. Thanks
>> >> >> >> >>>>> a lot again! :)
>>
>> >> >> >> >>>>> On Oct 14, 8:47 pm, Robert Kluin <[email protected]> 
>> >> >> >> >>>>> wrote:
>> >> >> >> >>>>>> Hey Carles,
>> >> >> >> >>>>>> š Glad it seems helpful. šI am hoping to get time today to 
>> >> >> >> >>>>>> push out
>> >> >> >> >>>>>> some revisions and sample code.
>>
>> >> >> >> >>>>>> Robert
>>
>> >> >> >> >>>>>> On Thu, Oct 14, 2010 at 05:50, Carles Gonzalez 
>> >> >> >> >>>>>> <[email protected]> wrote:
>> >> >> >> >>>>>>> Robert, I took a brief inspection at your code and seems 
>> >> >> >> >>>>>>> very cool. Exactly
>> >> >> >> >>>>>>> what i was lloking for for my report generation and such.
>> >> >> >> >>>>>>> I'm looking forward for more examples, but it seems a very 
>> >> >> >> >>>>>>> valuable addition
>> >> >> >> >>>>>>> for our toolbox.
>> >> >> >> >>>>>>> Thanks a lot!
>>
>> >> >> >> >>>>>>> On Wed, Oct 13, 2010 at 9:20 PM, Carles Gonzalez 
>> >> >> >> >>>>>>> <[email protected]> wrote:
>>
>> >> >> >> >>>>>>>> Neat! I'm going to see this code, hopefully I'll 
>> >> >> >> >>>>>>>> understand something :)
>> >> >> >> >>>>>>>> On Wednesday, October 13, 2010, Robert Kluin 
>> >> >> >> >>>>>>>> <[email protected]>
>> >> >> >> >>>>>>>> wrote:
>> >> >> >> >>>>>>>>> Hey Dmitry,
>> >> >> >> >>>>>>>>> š šIn case it might help, I pushed some code to 
>> >> >> >> >>>>>>>>> bitbucket. šAt the
>> >> >> >> >>>>>>>>> moment I would (personally) say the code is not too 
>> >> >> >> >>>>>>>>> pretty, but it
>> >> >> >> >>>>>>>>> works well. š:)
>> >> >> >> >>>>>>>>> š š šhttp://bitbucket.org/thebobert/slagg
>>
>> >> >> >> >>>>>>>>> š Sorry it does not really have good documentation at the 
>> >> >> >> >>>>>>>>> moment, but
>> >> >> >> >>>>>>>>> I think the basic example I threw together will give you 
>> >> >> >> >>>>>>>>> a good idea
>> >> >> >> >>>>>>>>> of how to use it. šI need to do another cleanup pass over 
>> >> >> >> >>>>>>>>> the API to
>> >> >> >> >>>>>>>>> make a few more refinements.
>>
>> >> >> >> >>>>>>>>> š šI pulled this code out of one of my apps, and tried to 
>> >> >> >> >>>>>>>>> quickly
>> >> >> >> >>>>>>>>> refactor it to be a bit more generic. šWe are currently 
>> >> >> >> >>>>>>>>> using
>> >> >> >> >>>>>>>>> basically the same code in three apps to do some really 
>> >> >> >> >>>>>>>>> complex
>> >> >> >> >>>>>>>>> calculations. šAs soon as I get time I will get an 
>> >> >> >> >>>>>>>>> example up showing
>> >> >> >> >>>>>>>>> how to use it for neat stuff, like overall, yearly, 
>> >> >> >> >>>>>>>>> monthly, and daily
>> >> >> >> >>>>>>>>> aggregates across multiple values (like total dollars and 
>> >> >> >> >>>>>>>>> quantity).
>> >> >> >> >>>>>>>>> The cool thing is that you can do all of those 
>> >> >> >> >>>>>>>>> aggregations across
>> >> >> >> >>>>>>>>> various groupings, like customer, company, contact, and 
>> >> >> >> >>>>>>>>> sales-person,
>> >> >> >> >>>>>>>>> at once. šI'll get that code pushed out in the next few 
>> >> >> >> >>>>>>>>> days.
>>
>> ...
>>
>> продолжение »
>
> --
> You received this message because you are subscribed to the Google Groups 
> "Google App Engine" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to 
> [email protected].
> For more options, visit this group at 
> http://groups.google.com/group/google-appengine?hl=en.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.

Re: [google-appengine] Re: Fan-in with materialized views: A sketch

Reply via email to