[google-appengine] Re: Fan-in with materialized views: A sketch

Dmitry Fri, 19 Nov 2010 06:53:35 -0800

Robert,

finally I moved to tasks... Without tasks I get a lot of missed items
in AggWork table.


But even with tasks - sometimes it stucks:

Traceback (most recent call last):
  File "/base/python_runtime/python_lib/versions/1/google/appengine/
ext/webapp/__init__.py", line 513, in __call__
    handler.post(*groups)
  File "/base/data/home/apps/app/1.346324028841294499/slagg/
__init__.py", line 584, in post
    self.batch = lock.get_write_lock()
  File "/base/data/home/apps/app/1.346324028841294499/slagg/
__init__.py", line 144, in get_write_lock
    raise WriteLockFailedError
WriteLockFailedError

And cannot process some items in AggWork table

On 10 ноя, 05:43, Robert Kluin <[email protected]> wrote:
> Dmitry,
>   Glad to hear the bucket size helped!
>
>   Please let me know how it goes.  If you have good results, maybe we
> can find a clean way to facilitate directly doing the work done by
> create work.
>
> Robert
>
> On Tue, Nov 9, 2010 at 18:11, Dmitry <[email protected]> wrote:
> > Robert, thanks a lot for your sugestions!
> > Increasing bucket size made a huge difference. Need to study
> > theoretical part... and find the optimal bucket size for 50/sec.
>
> > yep, I use creatework directly without fanout. I will try to insert
> > 'work' models within my original data transaction and compare the
> > performance.
>
> > On Nov 9, 3:14 am, Robert Kluin <[email protected]> wrote:
> >> Hey Dmitry,
> >>    I am working on getting some decent documentation about when you
> >> might want to use fanout versus directly using creatwork.  And, about
> >> usage in general.  If I am dealing with one or two aggregations I
> >> usually use creatework directly.  You can only insert five
> >> transactional tasks in one database transaction, so with four you
> >> could directly use creatework eliminating a fanout task.
>
> >>   As far as rates go, I have been using a rate of 35/s and bucket size
> >> of 40.  However, I also get periodic queue backups.  I think the max
> >> rate / sec is currently 50, but I thought there was an announcement it
> >> was getting increased (maybe I am just remembering the increase to
> >> 50/sec announcement though).  You might want to bump your rate up to
> >> 50/sec.  I always use a dedicated queue for creatework and aggregation
> >> tasks.  In one of my apps I use multiple queues to get a bit higher
> >> throughput.
>
> >>   I generally prefer to use creatework tasks; they cleanly handle any
> >> failures that occur and keeps my primary processing running as fast as
> >> possible.  However, when I first started using this type of
> >> aggregation technique I created the 'work' models and attempted to
> >> insert the aggregator task (non-transactionaly!) within my primary
> >> transaction.  If your primary processing is within tasks, and your
> >> tasks are fast enough, give it a shot.  Converting CreateWorkHandler
> >> to something you can use directly should not be a big deal.
>
> >> Robert
>
> >> On Mon, Nov 8, 2010 at 18:14, Dmitry <[email protected]> wrote:
> >> > Hi Robert,
>
> >> > What queue configuration do you use for your system?
> >> > I came to another problem. I usually process several feeds in parallel
> >> > and can insert up to 20-30 new items to the database. With 4
> >> > aggregators it's >80 create_work tasks in one moment. So after a
> >> > minute I can have up to 1000 tasks in queue... so I have up to 5
> >> > minutes delay in processing.
>
> >> > It seems that for initial aggregation I should insert create work
> >> > models not in tasks.
> >> > I messed up again:)
>
> >> > On Nov 5, 6:46 am, Robert Kluin <[email protected]> wrote:
> >> >> Dmitry,
> >> >>    I finally got the time to make these changes.  Let me know if that
> >> >> works for your use-case.
>
> >> >>    I really appreciate all of your suggestions and help with this.
>
> >> >> Robert
>
> >> >> 2010/11/3 Dmitry <[email protected]>:
>
> >> >> > oops I read expression in wrong direction. This will definitely work!
>
> >> >> > On Nov 3, 7:43 pm, Robert Kluin <[email protected]> wrote:
> >> >> >> Dmitry,
> >> >> >> š Right, I know those will cause problems. So what about my 
> >> >> >> suggested solution of using:
>
> >> >> >> šif not re.match("^[a-zA-Z0-9-]+$", task_name):
> >> >> >> š š š task_name = šsha1_hash(task_name)
>
> >> >> >> That should correctly handle your use cases, since the full name 
> >> >> >> will be hashed.
>
> >> >> >> Are there issues with that solution I am not seeing?
>
> >> >> >> Robert
>
> >> >> >> On Nov 3, 2010, at 3:52, Dmitry <[email protected]> wrote:
>
> >> >> >> > Robert,
>
> >> >> >> > You will get into the trouble with these aggregations:
>
> >> >> >> > urls:
> >> >> >> > http://ÐÒÁ×ÉÔÅÌØÓÔ×Ï.ÒÆ/search/?phrase=ÎÁÌÏÇ&section=gov_events ->
> >> >> >> > httpsearchphrase
> >> >> >> > http://ÐÒÁ×ÉÔÅÌØÓÔ×Ï.ÒÆ/search/?phrase=ÐÒÅÚÉÄÅÎÔ&section=gov_events
> >> >> >> >  ->
> >> >> >> > httpsearchphrase
>
> >> >> >> > or usernames:
> >> >> >> > ÍÓÔÉÔÅÌØ2000 -> 2000
> >> >> >> > ÔÅÓÔ2000 -> 2000
>
> >> >> >> > but anyway in most cases your approach will work well:) You can 
> >> >> >> > leave
> >> >> >> > it up to the user (add some kind of flag "use_hash").
>
> >> >> >> > or we can try to url encode strings:
> >> >> >> > urllib.quote(task_name.encode('utf-8'))
> >> >> >> > http3AD0BFD180D0B0D0B2D0B8D182D0B5D0BBD18CD181D182D0B2D0BED180D184search3Fphrase3DD0BDD0B0D0BBD0BED0B3
> >> >> >> > http3AD0BFD180D0B0D0B2D0B8D182D0B5D0BBD18CD181D182D0B2D0BED180D184search3Fphrase3DD0BFD180D0B5D0B7D0B8D0B4D0B5D0BDD182
>
> >> >> >> > but this is not better that hash :-D
>
> >> >> >> > thanks
>
> >> >> >> > On Nov 3, 7:13 am, Robert Kluin <[email protected]> wrote:
> >> >> >> >> Hey Dmitry,
> >> >> >> >> š I am sure the "fix" in that commit is _not_ a good idea. 
> >> >> >> >> šOriginally
> >> >> >> >> I stuck it in because I use entity keys as the task-name, 
> >> >> >> >> sometimes
> >> >> >> >> they contains characters not allowed in task-names. šI actually
> >> >> >> >> debated for several days about pushing that update out; šfinally I
> >> >> >> >> decide to push and hope someone would notice and offer their 
> >> >> >> >> thoughts.
>
> >> >> >> >> š I like your idea a lot. šBut, for many aggregations I like to 
> >> >> >> >> use
> >> >> >> >> entity keys, it makes it possible for me to visually see what a 
> >> >> >> >> task
> >> >> >> >> is doing. šWhat do you think about something like the following
> >> >> >> >> approach:
>
> >> >> >> >> š if not re.match("^[a-zA-Z0-9-]+$", task_name):
> >> >> >> >> š š š task_name = sha1_hash(task_name)
>
> >> >> >> >> That should allow 'valid' names to remain as-is, but it will 
> >> >> >> >> safely
> >> >> >> >> encode non-valid task-names. šDo you think that is an acceptable
> >> >> >> >> method?
>
> >> >> >> >> Thanks a lot for your feedback.
>
> >> >> >> >> Robert
>
> >> >> >> >> On Tue, Nov 2, 2010 at 07:15, Dmitry <[email protected]> 
> >> >> >> >> wrote:
> >> >> >> >>> Hi Robert,
>
> >> >> >> >>> Regarding your latest commit:
>
> >> >> >> >>> # TODO: find a better solution for cleaning up the name.
> >> >> >> >>> task_name = re.sub('[^a-zA-Z0-9-]', '', task_name)[:500]
>
> >> >> >> >>> Don't think this is a good idea:) For example I have unicode
> >> >> >> >>> characters in aggregation value. In this case regexp will return
> >> >> >> >>> nothing.
> >> >> >> >>> I use sha1 hash now... but there's also a little possibility of
> >> >> >> >>> collision
>
> >> >> >> >>> sha1_hash(self.agg_name)
>
> >> >> >> >>> def utf8encoded(data):
> >> >> >> >>> šif data is None:
> >> >> >> >>> š šreturn None
> >> >> >> >>> šif isinstance(data, unicode):
> >> >> >> >>> š šreturn unicode(data).encode('utf-8')
> >> >> >> >>> šelse:
> >> >> >> >>> š šreturn data
>
> >> >> >> >>> def sha1_hash(value):
> >> >> >> >>> šreturn hashlib.sha1(utf8encoded(value)).hexdigest()
>
> >> >> >> >>> On Oct 24, 9:26 pm, Robert Kluin <[email protected]> wrote:
> >> >> >> >>>> Hi Dmitry,
> >> >> >> >>>> š Glad to hear it was helpful! šNot sure when you checked it 
> >> >> >> >>>> out last,
> >> >> >> >>>> but I made a number of good (I think) improvements in the last 
> >> >> >> >>>> couple
> >> >> >> >>>> days, such as continuations to allow splitting large groups of 
> >> >> >> >>>> work
> >> >> >> >>>> up.
>
> >> >> >> >>>> Robert
>
> >> >> >> >>>> On Sun, Oct 24, 2010 at 07:57, Dmitry 
> >> >> >> >>>> <[email protected]> wrote:
> >> >> >> >>>>> Robert,
>
> >> >> >> >>>>> You grouping_with_date_rollup.py example was extremely 
> >> >> >> >>>>> helpful. Thanks
> >> >> >> >>>>> a lot again! :)
>
> >> >> >> >>>>> On Oct 14, 8:47 pm, Robert Kluin <[email protected]> 
> >> >> >> >>>>> wrote:
> >> >> >> >>>>>> Hey Carles,
> >> >> >> >>>>>> š Glad it seems helpful. šI am hoping to get time today to 
> >> >> >> >>>>>> push out
> >> >> >> >>>>>> some revisions and sample code.
>
> >> >> >> >>>>>> Robert
>
> >> >> >> >>>>>> On Thu, Oct 14, 2010 at 05:50, Carles Gonzalez 
> >> >> >> >>>>>> <[email protected]> wrote:
> >> >> >> >>>>>>> Robert, I took a brief inspection at your code and seems 
> >> >> >> >>>>>>> very cool. Exactly
> >> >> >> >>>>>>> what i was lloking for for my report generation and such.
> >> >> >> >>>>>>> I'm looking forward for more examples, but it seems a very 
> >> >> >> >>>>>>> valuable addition
> >> >> >> >>>>>>> for our toolbox.
> >> >> >> >>>>>>> Thanks a lot!
>
> >> >> >> >>>>>>> On Wed, Oct 13, 2010 at 9:20 PM, Carles Gonzalez 
> >> >> >> >>>>>>> <[email protected]> wrote:
>
> >> >> >> >>>>>>>> Neat! I'm going to see this code, hopefully I'll understand 
> >> >> >> >>>>>>>> something :)
> >> >> >> >>>>>>>> On Wednesday, October 13, 2010, Robert Kluin 
> >> >> >> >>>>>>>> <[email protected]>
> >> >> >> >>>>>>>> wrote:
> >> >> >> >>>>>>>>> Hey Dmitry,
> >> >> >> >>>>>>>>> š šIn case it might help, I pushed some code to bitbucket. 
> >> >> >> >>>>>>>>> šAt the
> >> >> >> >>>>>>>>> moment I would (personally) say the code is not too 
> >> >> >> >>>>>>>>> pretty, but it
> >> >> >> >>>>>>>>> works well. š:)
> >> >> >> >>>>>>>>> š š šhttp://bitbucket.org/thebobert/slagg
>
> >> >> >> >>>>>>>>> š Sorry it does not really have good documentation at the 
> >> >> >> >>>>>>>>> moment, but
> >> >> >> >>>>>>>>> I think the basic example I threw together will give you a 
> >> >> >> >>>>>>>>> good idea
> >> >> >> >>>>>>>>> of how to use it. šI need to do another cleanup pass over 
> >> >> >> >>>>>>>>> the API to
> >> >> >> >>>>>>>>> make a few more refinements.
>
> >> >> >> >>>>>>>>> š šI pulled this code out of one of my apps, and tried to 
> >> >> >> >>>>>>>>> quickly
> >> >> >> >>>>>>>>> refactor it to be a bit more generic. šWe are currently 
> >> >> >> >>>>>>>>> using
> >> >> >> >>>>>>>>> basically the same code in three apps to do some really 
> >> >> >> >>>>>>>>> complex
> >> >> >> >>>>>>>>> calculations. šAs soon as I get time I will get an example 
> >> >> >> >>>>>>>>> up showing
> >> >> >> >>>>>>>>> how to use it for neat stuff, like overall, yearly, 
> >> >> >> >>>>>>>>> monthly, and daily
> >> >> >> >>>>>>>>> aggregates across multiple values (like total dollars and 
> >> >> >> >>>>>>>>> quantity).
> >> >> >> >>>>>>>>> The cool thing is that you can do all of those 
> >> >> >> >>>>>>>>> aggregations across
> >> >> >> >>>>>>>>> various groupings, like customer, company, contact, and 
> >> >> >> >>>>>>>>> sales-person,
> >> >> >> >>>>>>>>> at once. šI'll get that code pushed out in the next few 
> >> >> >> >>>>>>>>> days.
>
> ...
>
> продолжение »

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.

[google-appengine] Re: Fan-in with materialized views: A sketch

Reply via email to