Re: [google-appengine] Re: Fan-in with materialized views: A sketch

Robert Kluin Tue, 02 Nov 2010 21:13:44 -0700

Hey Dmitry,
  I am sure the "fix" in that commit is _not_ a good idea.  Originally
I stuck it in because I use entity keys as the task-name, sometimes
they contains characters not allowed in task-names.  I actually
debated for several days about pushing that update out;  finally I
decide to push and hope someone would notice and offer their thoughts.


  I like your idea a lot.  But, for many aggregations I like to use
entity keys, it makes it possible for me to visually see what a task
is doing.  What do you think about something like the following
approach:

  if not re.match("^[a-zA-Z0-9-]+$", task_name):
      task_name = sha1_hash(task_name)

That should allow 'valid' names to remain as-is, but it will safely
encode non-valid task-names.  Do you think that is an acceptable
method?

Thanks a lot for your feedback.


Robert




On Tue, Nov 2, 2010 at 07:15, Dmitry <[email protected]> wrote:
> Hi Robert,
>
> Regarding your latest commit:
>
> # TODO: find a better solution for cleaning up the name.
> task_name = re.sub('[^a-zA-Z0-9-]', '', task_name)[:500]
>
> Don't think this is a good idea:) For example I have unicode
> characters in aggregation value. In this case regexp will return
> nothing.
> I use sha1 hash now... but there's also a little possibility of
> collision
>
> sha1_hash(self.agg_name)
>
> def utf8encoded(data):
>  if data is None:
>    return None
>  if isinstance(data, unicode):
>    return unicode(data).encode('utf-8')
>  else:
>    return data
>
> def sha1_hash(value):
>  return hashlib.sha1(utf8encoded(value)).hexdigest()
>
> On Oct 24, 9:26 pm, Robert Kluin <[email protected]> wrote:
>> Hi Dmitry,
>>   Glad to hear it was helpful!  Not sure when you checked it out last,
>> but I made a number of good (I think) improvements in the last couple
>> days, such as continuations to allow splitting large groups of work
>> up.
>>
>> Robert
>>
>> On Sun, Oct 24, 2010 at 07:57, Dmitry <[email protected]> wrote:
>> > Robert,
>>
>> > You grouping_with_date_rollup.py example was extremely helpful. Thanks
>> > a lot again! :)
>>
>> > On Oct 14, 8:47 pm, Robert Kluin <[email protected]> wrote:
>> >> Hey Carles,
>> >>   Glad it seems helpful.  I am hoping to get time today to push out
>> >> some revisions and sample code.
>>
>> >> Robert
>>
>> >> On Thu, Oct 14, 2010 at 05:50, Carles Gonzalez <[email protected]> wrote:
>> >> > Robert, I took a brief inspection at your code and seems very cool. 
>> >> > Exactly
>> >> > what i was lloking for for my report generation and such.
>> >> > I'm looking forward for more examples, but it seems a very valuable 
>> >> > addition
>> >> > for our toolbox.
>> >> > Thanks a lot!
>>
>> >> > On Wed, Oct 13, 2010 at 9:20 PM, Carles Gonzalez <[email protected]> 
>> >> > wrote:
>>
>> >> >> Neat! I'm going to see this code, hopefully I'll understand something 
>> >> >> :)
>> >> >> On Wednesday, October 13, 2010, Robert Kluin <[email protected]>
>> >> >> wrote:
>> >> >> > Hey Dmitry,
>> >> >> >    In case it might help, I pushed some code to bitbucket.  At the
>> >> >> > moment I would (personally) say the code is not too pretty, but it
>> >> >> > works well.  :)
>> >> >> >       http://bitbucket.org/thebobert/slagg
>>
>> >> >> >   Sorry it does not really have good documentation at the moment, but
>> >> >> > I think the basic example I threw together will give you a good idea
>> >> >> > of how to use it.  I need to do another cleanup pass over the API to
>> >> >> > make a few more refinements.
>>
>> >> >> >    I pulled this code out of one of my apps, and tried to quickly
>> >> >> > refactor it to be a bit more generic.  We are currently using
>> >> >> > basically the same code in three apps to do some really complex
>> >> >> > calculations.  As soon as I get time I will get an example up showing
>> >> >> > how to use it for neat stuff, like overall, yearly, monthly, and 
>> >> >> > daily
>> >> >> > aggregates across multiple values (like total dollars and quantity).
>> >> >> > The cool thing is that you can do all of those aggregations across
>> >> >> > various groupings, like customer, company, contact, and sales-person,
>> >> >> > at once.  I'll get that code pushed out in the next few days.
>>
>> >> >> >   Would love to get some feedback on it.
>>
>> >> >> > Robert
>>
>> >> >> > On Tue, Oct 12, 2010 at 17:26, Dmitry <[email protected]> 
>> >> >> > wrote:
>> >> >> >> Ben, thanks for your code! I'm trying to understand all this stuff
>> >> >> >> too...
>> >> >> >> Robert, any success with your "library"? May be you've already done
>> >> >> >> all stuff we are trying to implement...
>>
>> >> >> >> p.s. where is Brett S.:) would like to hear his comments on this
>>
>> >> >> >> On Sep 21, 1:49 pm, Ben <[email protected]> wrote:
>> >> >> >>> Thanks for your insights. I would love feedback on this 
>> >> >> >>> implementation
>> >> >> >>> (Brett S. suggested we send in our code for
>> >> >> >>> this)http://pastebin.com/3pUhFdk8
>>
>> >> >> >>> This implementation is for just one materialized view row at a time
>> >> >> >>> (e.g. a simple counter, no presence markers). Hopefully putting an 
>> >> >> >>> ETA
>> >> >> >>> on the transactional task will relieve the write pressure, since
>> >> >> >>> usually it should be an old update with an out-of-date sequence 
>> >> >> >>> number
>> >> >> >>> and be discarded (the update having already been completed in 
>> >> >> >>> batch by
>> >> >> >>> the fork-join-queue).
>>
>> >> >> >>> I'd love to generalize this to do more than one materialized view 
>> >> >> >>> row
>> >> >> >>> but thought I'd get feedback first.
>>
>> >> >> >>> Thanks,
>> >> >> >>> Ben
>>
>> >> >> >>> On Sep 17, 7:30 am, Robert Kluin <[email protected]> wrote:
>>
>> >> >> >>> > Responses inline.
>>
>> >> >> >>> > On Thu, Sep 16, 2010 at 17:32, Ben <[email protected]>
>> >> >> >>> > wrote:
>> >> >> >>> > > I have a question about Brett Slatkin's talk at I/O 2010 on 
>> >> >> >>> > > data
>> >> >> >>> > > pipelines. The question is about slide #67 of his pdf,
>> >> >> >>> > > corresponding
>> >> >> >>> > > to minute 51:30 of his talk
>>
>> >> >> >>> > > >http://code.google.com/events/io/2010/sessions/high-throughput-data-p...
>>
>> >> >> >>> > > I am wondering what is supposed to happen in the transactional
>> >> >> >>> > > task
>> >> >> >>> > > (bullet point 2c). Would these updates to the materialized view
>> >> >> >>> > > cause
>> >> >> >>> > > you to write too frequently to the entity group containing the
>> >> >> >>> > > materialized view?
>>
>> >> >> >>> > I think there are really two different approaches you can use to
>> >> >> >>> > insert your work models.
>> >> >> >>> > 1)  The work models get added to the original entity's group.  
>> >> >> >>> > So,
>> >> >> >>> > inside of the original transaction you do not write to the entity
>> >> >> >>> > group containing the materialized view -- so no contention on it.
>> >> >> >>> > Commit the transaction and proceed to step 3.
>> >> >> >>> > 2)  You kick off a transactional task to insert the work model, 
>> >> >> >>> > or
>> >> >> >>> > fan-out more tasks to create work models  :).   Then you proceed 
>> >> >> >>> > to
>> >> >> >>> > step 3.
>>
>> >> >> >>> > You can use method 1 if you have only a few aggregates.  If you 
>> >> >> >>> > have
>> >> >> >>> > more aggregates use the second method.  I have a "library" I am
>> >> >> >>> > almost
>> >> >> >>> > ready to open source that makes method 2 really easy, so you can
>> >> >> >>> > have
>> >> >> >>> > lots of aggregates.  I'll post to this group when I release it.
>>
>> >> >> >>> > > And a related question, what happens if there is a failure just
>> >> >> >>> > > after
>> >> >> >>> > > the transaction in bullet #2, but right before the named task 
>> >> >> >>> > > gets
>> >> >> >>> > > inserted in bullet #3. In my current implementation I just left
>> >> >> >>> > > out
>> >> >> >>> > > the transactional task (bullet point 2c) but I think that 
>> >> >> >>> > > causes
>> >> >> >>> > > me to
>> >> >> >>> > > lose the eventual consistency.
>>
>> >> >> >>> > Failure between steps 2 and 3 just means _that_ particular update
>> >> >> >>> > will
>> >> >> >>> > not try to kick-off, ie insert, the fan-in (aggregation) task.  
>> >> >> >>> > But
>> >> >> >>> > it
>> >> >> >>> > might have already been inserted by the previous update, or the 
>> >> >> >>> > next
>> >> >> >>> > update.  However, if nothing else kicks of the fan-in task you 
>> >> >> >>> > will
>> >> >> >>> > need some periodic "cleanup" method to catch the update and kick 
>> >> >> >>> > of
>> >> >> >>> > the fan-in task.  Depending on exactly how you implemented step 2
>> >> >> >>> > you
>> >> >> >>> > may not need a transactional task.
>>
>> >> >> >>> > Robert
>>
>> >> >> >>> > > Thanks!
>>
>> >> > --
>> >> > You received this message because you are subscribed to the Google 
>> >> > Groups
>> >> > "Google App Engine" group.
>> >> > To post to this group, send email to [email protected].
>> >> > To unsubscribe from this group, send email to
>> >> > [email protected].
>> >> > For more options, visit this group at
>> >> >http://groups.google.com/group/google-appengine?hl=en.
>>
>> > --
>> > You received this message because you are subscribed to the Google Groups 
>> > "Google App Engine" group.
>> > To post to this group, send email to [email protected].
>> > To unsubscribe from this group, send email to 
>> > [email protected].
>> > For more options, visit this group 
>> > athttp://groups.google.com/group/google-appengine?hl=en.
>
> --
> You received this message because you are subscribed to the Google Groups 
> "Google App Engine" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to 
> [email protected].
> For more options, visit this group at 
> http://groups.google.com/group/google-appengine?hl=en.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.

Re: [google-appengine] Re: Fan-in with materialized views: A sketch

Reply via email to