Re: [google-appengine] Re: Fan-in with materialized views: A sketch

Robert Kluin Tue, 09 Nov 2010 18:43:13 -0800

Dmitry,
  Glad to hear the bucket size helped!

  Please let me know how it goes.  If you have good results, maybe we
can find a clean way to facilitate directly doing the work done by
create work.



Robert




On Tue, Nov 9, 2010 at 18:11, Dmitry <[email protected]> wrote:
> Robert, thanks a lot for your sugestions!
> Increasing bucket size made a huge difference. Need to study
> theoretical part... and find the optimal bucket size for 50/sec.
>
> yep, I use creatework directly without fanout. I will try to insert
> 'work' models within my original data transaction and compare the
> performance.
>
> On Nov 9, 3:14 am, Robert Kluin <[email protected]> wrote:
>> Hey Dmitry,
>>    I am working on getting some decent documentation about when you
>> might want to use fanout versus directly using creatwork.  And, about
>> usage in general.  If I am dealing with one or two aggregations I
>> usually use creatework directly.  You can only insert five
>> transactional tasks in one database transaction, so with four you
>> could directly use creatework eliminating a fanout task.
>>
>>   As far as rates go, I have been using a rate of 35/s and bucket size
>> of 40.  However, I also get periodic queue backups.  I think the max
>> rate / sec is currently 50, but I thought there was an announcement it
>> was getting increased (maybe I am just remembering the increase to
>> 50/sec announcement though).  You might want to bump your rate up to
>> 50/sec.  I always use a dedicated queue for creatework and aggregation
>> tasks.  In one of my apps I use multiple queues to get a bit higher
>> throughput.
>>
>>   I generally prefer to use creatework tasks; they cleanly handle any
>> failures that occur and keeps my primary processing running as fast as
>> possible.  However, when I first started using this type of
>> aggregation technique I created the 'work' models and attempted to
>> insert the aggregator task (non-transactionaly!) within my primary
>> transaction.  If your primary processing is within tasks, and your
>> tasks are fast enough, give it a shot.  Converting CreateWorkHandler
>> to something you can use directly should not be a big deal.
>>
>> Robert
>>
>> On Mon, Nov 8, 2010 at 18:14, Dmitry <[email protected]> wrote:
>> > Hi Robert,
>>
>> > What queue configuration do you use for your system?
>> > I came to another problem. I usually process several feeds in parallel
>> > and can insert up to 20-30 new items to the database. With 4
>> > aggregators it's >80 create_work tasks in one moment. So after a
>> > minute I can have up to 1000 tasks in queue... so I have up to 5
>> > minutes delay in processing.
>>
>> > It seems that for initial aggregation I should insert create work
>> > models not in tasks.
>> > I messed up again:)
>>
>> > On Nov 5, 6:46 am, Robert Kluin <[email protected]> wrote:
>> >> Dmitry,
>> >>    I finally got the time to make these changes.  Let me know if that
>> >> works for your use-case.
>>
>> >>    I really appreciate all of your suggestions and help with this.
>>
>> >> Robert
>>
>> >> 2010/11/3 Dmitry <[email protected]>:
>>
>> >> > oops I read expression in wrong direction. This will definitely work!
>>
>> >> > On Nov 3, 7:43 pm, Robert Kluin <[email protected]> wrote:
>> >> >> Dmitry,
>> >> >> š Right, I know those will cause problems. So what about my suggested 
>> >> >> solution of using:
>>
>> >> >> šif not re.match("^[a-zA-Z0-9-]+$", task_name):
>> >> >> š š š task_name = šsha1_hash(task_name)
>>
>> >> >> That should correctly handle your use cases, since the full name will 
>> >> >> be hashed.
>>
>> >> >> Are there issues with that solution I am not seeing?
>>
>> >> >> Robert
>>
>> >> >> On Nov 3, 2010, at 3:52, Dmitry <[email protected]> wrote:
>>
>> >> >> > Robert,
>>
>> >> >> > You will get into the trouble with these aggregations:
>>
>> >> >> > urls:
>> >> >> > http://ÐÒÁ×ÉÔÅÌØÓÔ×Ï.ÒÆ/search/?phrase=ÎÁÌÏÇ&section=gov_events ->
>> >> >> > httpsearchphrase
>> >> >> > http://ÐÒÁ×ÉÔÅÌØÓÔ×Ï.ÒÆ/search/?phrase=ÐÒÅÚÉÄÅÎÔ&section=gov_events 
>> >> >> > ->
>> >> >> > httpsearchphrase
>>
>> >> >> > or usernames:
>> >> >> > ÍÓÔÉÔÅÌØ2000 -> 2000
>> >> >> > ÔÅÓÔ2000 -> 2000
>>
>> >> >> > but anyway in most cases your approach will work well:) You can leave
>> >> >> > it up to the user (add some kind of flag "use_hash").
>>
>> >> >> > or we can try to url encode strings:
>> >> >> > urllib.quote(task_name.encode('utf-8'))
>> >> >> > http3AD0BFD180D0B0D0B2D0B8D182D0B5D0BBD18CD181D182D0B2D0BED180D184search3Fphrase3DD0BDD0B0D0BBD0BED0B3
>> >> >> > http3AD0BFD180D0B0D0B2D0B8D182D0B5D0BBD18CD181D182D0B2D0BED180D184search3Fphrase3DD0BFD180D0B5D0B7D0B8D0B4D0B5D0BDD182
>>
>> >> >> > but this is not better that hash :-D
>>
>> >> >> > thanks
>>
>> >> >> > On Nov 3, 7:13 am, Robert Kluin <[email protected]> wrote:
>> >> >> >> Hey Dmitry,
>> >> >> >> š I am sure the "fix" in that commit is _not_ a good idea. 
>> >> >> >> šOriginally
>> >> >> >> I stuck it in because I use entity keys as the task-name, sometimes
>> >> >> >> they contains characters not allowed in task-names. šI actually
>> >> >> >> debated for several days about pushing that update out; šfinally I
>> >> >> >> decide to push and hope someone would notice and offer their 
>> >> >> >> thoughts.
>>
>> >> >> >> š I like your idea a lot. šBut, for many aggregations I like to use
>> >> >> >> entity keys, it makes it possible for me to visually see what a task
>> >> >> >> is doing. šWhat do you think about something like the following
>> >> >> >> approach:
>>
>> >> >> >> š if not re.match("^[a-zA-Z0-9-]+$", task_name):
>> >> >> >> š š š task_name = sha1_hash(task_name)
>>
>> >> >> >> That should allow 'valid' names to remain as-is, but it will safely
>> >> >> >> encode non-valid task-names. šDo you think that is an acceptable
>> >> >> >> method?
>>
>> >> >> >> Thanks a lot for your feedback.
>>
>> >> >> >> Robert
>>
>> >> >> >> On Tue, Nov 2, 2010 at 07:15, Dmitry <[email protected]> 
>> >> >> >> wrote:
>> >> >> >>> Hi Robert,
>>
>> >> >> >>> Regarding your latest commit:
>>
>> >> >> >>> # TODO: find a better solution for cleaning up the name.
>> >> >> >>> task_name = re.sub('[^a-zA-Z0-9-]', '', task_name)[:500]
>>
>> >> >> >>> Don't think this is a good idea:) For example I have unicode
>> >> >> >>> characters in aggregation value. In this case regexp will return
>> >> >> >>> nothing.
>> >> >> >>> I use sha1 hash now... but there's also a little possibility of
>> >> >> >>> collision
>>
>> >> >> >>> sha1_hash(self.agg_name)
>>
>> >> >> >>> def utf8encoded(data):
>> >> >> >>> šif data is None:
>> >> >> >>> š šreturn None
>> >> >> >>> šif isinstance(data, unicode):
>> >> >> >>> š šreturn unicode(data).encode('utf-8')
>> >> >> >>> šelse:
>> >> >> >>> š šreturn data
>>
>> >> >> >>> def sha1_hash(value):
>> >> >> >>> šreturn hashlib.sha1(utf8encoded(value)).hexdigest()
>>
>> >> >> >>> On Oct 24, 9:26 pm, Robert Kluin <[email protected]> wrote:
>> >> >> >>>> Hi Dmitry,
>> >> >> >>>> š Glad to hear it was helpful! šNot sure when you checked it out 
>> >> >> >>>> last,
>> >> >> >>>> but I made a number of good (I think) improvements in the last 
>> >> >> >>>> couple
>> >> >> >>>> days, such as continuations to allow splitting large groups of 
>> >> >> >>>> work
>> >> >> >>>> up.
>>
>> >> >> >>>> Robert
>>
>> >> >> >>>> On Sun, Oct 24, 2010 at 07:57, Dmitry <[email protected]> 
>> >> >> >>>> wrote:
>> >> >> >>>>> Robert,
>>
>> >> >> >>>>> You grouping_with_date_rollup.py example was extremely helpful. 
>> >> >> >>>>> Thanks
>> >> >> >>>>> a lot again! :)
>>
>> >> >> >>>>> On Oct 14, 8:47 pm, Robert Kluin <[email protected]> wrote:
>> >> >> >>>>>> Hey Carles,
>> >> >> >>>>>> š Glad it seems helpful. šI am hoping to get time today to push 
>> >> >> >>>>>> out
>> >> >> >>>>>> some revisions and sample code.
>>
>> >> >> >>>>>> Robert
>>
>> >> >> >>>>>> On Thu, Oct 14, 2010 at 05:50, Carles Gonzalez 
>> >> >> >>>>>> <[email protected]> wrote:
>> >> >> >>>>>>> Robert, I took a brief inspection at your code and seems very 
>> >> >> >>>>>>> cool. Exactly
>> >> >> >>>>>>> what i was lloking for for my report generation and such.
>> >> >> >>>>>>> I'm looking forward for more examples, but it seems a very 
>> >> >> >>>>>>> valuable addition
>> >> >> >>>>>>> for our toolbox.
>> >> >> >>>>>>> Thanks a lot!
>>
>> >> >> >>>>>>> On Wed, Oct 13, 2010 at 9:20 PM, Carles Gonzalez 
>> >> >> >>>>>>> <[email protected]> wrote:
>>
>> >> >> >>>>>>>> Neat! I'm going to see this code, hopefully I'll understand 
>> >> >> >>>>>>>> something :)
>> >> >> >>>>>>>> On Wednesday, October 13, 2010, Robert Kluin 
>> >> >> >>>>>>>> <[email protected]>
>> >> >> >>>>>>>> wrote:
>> >> >> >>>>>>>>> Hey Dmitry,
>> >> >> >>>>>>>>> š šIn case it might help, I pushed some code to bitbucket. 
>> >> >> >>>>>>>>> šAt the
>> >> >> >>>>>>>>> moment I would (personally) say the code is not too pretty, 
>> >> >> >>>>>>>>> but it
>> >> >> >>>>>>>>> works well. š:)
>> >> >> >>>>>>>>> š š šhttp://bitbucket.org/thebobert/slagg
>>
>> >> >> >>>>>>>>> š Sorry it does not really have good documentation at the 
>> >> >> >>>>>>>>> moment, but
>> >> >> >>>>>>>>> I think the basic example I threw together will give you a 
>> >> >> >>>>>>>>> good idea
>> >> >> >>>>>>>>> of how to use it. šI need to do another cleanup pass over 
>> >> >> >>>>>>>>> the API to
>> >> >> >>>>>>>>> make a few more refinements.
>>
>> >> >> >>>>>>>>> š šI pulled this code out of one of my apps, and tried to 
>> >> >> >>>>>>>>> quickly
>> >> >> >>>>>>>>> refactor it to be a bit more generic. šWe are currently using
>> >> >> >>>>>>>>> basically the same code in three apps to do some really 
>> >> >> >>>>>>>>> complex
>> >> >> >>>>>>>>> calculations. šAs soon as I get time I will get an example 
>> >> >> >>>>>>>>> up showing
>> >> >> >>>>>>>>> how to use it for neat stuff, like overall, yearly, monthly, 
>> >> >> >>>>>>>>> and daily
>> >> >> >>>>>>>>> aggregates across multiple values (like total dollars and 
>> >> >> >>>>>>>>> quantity).
>> >> >> >>>>>>>>> The cool thing is that you can do all of those aggregations 
>> >> >> >>>>>>>>> across
>> >> >> >>>>>>>>> various groupings, like customer, company, contact, and 
>> >> >> >>>>>>>>> sales-person,
>> >> >> >>>>>>>>> at once. šI'll get that code pushed out in the next few days.
>>
>> >> >> >>>>>>>>> š Would love to get some feedback on it.
>>
>> >> >> >>>>>>>>> Robert
>>
>> >> >> >>>>>>>>> On Tue, Oct 12, 2010 at 17:26, Dmitry 
>> >> >> >>>>>>>>> <[email protected]> wrote:
>> >> >> >>>>>>>>>> Ben, thanks for your code! I'm trying to understand all 
>> >> >> >>>>>>>>>> this stuff
>> >> >> >>>>>>>>>> too...
>> >> >> >>>>>>>>>> Robert, any success with your "library"? May be you've 
>> >> >> >>>>>>>>>> already done
>> >> >> >>>>>>>>>> all stuff we are trying to implement...
>>
>> >> >> >>>>>>>>>> p.s. where is Brett S.:) would like to hear his comments on 
>> >> >> >>>>>>>>>> this
>>
>> >> >> >>>>>>>>>> On Sep 21, 1:49 pm, Ben <[email protected]> wrote:
>> >> >> >>>>>>>>>>> Thanks for your insights. I would love feedback on this 
>> >> >> >>>>>>>>>>> implementation
>> >> >> >>>>>>>>>>> (Brett S. suggested we send in our code for
>> >> >> >>>>>>>>>>> this)http://pastebin.com/3pUhFdk8
>>
>> >> >> >>>>>>>>>>> This implementation is for just one materialized view row 
>> >> >> >>>>>>>>>>> at a time
>> >> >> >>>>>>>>>>> (e.g. a simple counter, no presence markers). Hopefully 
>> >> >> >>>>>>>>>>> putting an ETA
>> >> >> >>>>>>>>>>> on the transactional task will relieve the write pressure, 
>> >> >> >>>>>>>>>>> since
>> >> >> >>>>>>>>>>> usually it should be an old update with an out-of-date 
>> >> >> >>>>>>>>>>> sequence number
>> >> >> >>>>>>>>>>> and be
>>
>> ...
>>
>> read more »
>
> --
> You received this message because you are subscribed to the Google Groups 
> "Google App Engine" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to 
> [email protected].
> For more options, visit this group at 
> http://groups.google.com/group/google-appengine?hl=en.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.

Re: [google-appengine] Re: Fan-in with materialized views: A sketch

Reply via email to