oops I read expression in wrong direction. This will definitely work! On Nov 3, 7:43 pm, Robert Kluin <[email protected]> wrote: > Dmitry, > Right, I know those will cause problems. So what about my suggested > solution of using: > > if not re.match("^[a-zA-Z0-9-]+$", task_name): > task_name = sha1_hash(task_name) > > That should correctly handle your use cases, since the full name will be > hashed. > > Are there issues with that solution I am not seeing? > > Robert > > On Nov 3, 2010, at 3:52, Dmitry <[email protected]> wrote: > > > Robert, > > > You will get into the trouble with these aggregations: > > > urls: > > http://правительство.рф/search/?phrase=налог§ion=gov_events -> > > httpsearchphrase > > http://правительство.рф/search/?phrase=президент§ion=gov_events -> > > httpsearchphrase > > > or usernames: > > мститель2000 -> 2000 > > тест2000 -> 2000 > > > but anyway in most cases your approach will work well:) You can leave > > it up to the user (add some kind of flag "use_hash"). > > > or we can try to url encode strings: > > urllib.quote(task_name.encode('utf-8')) > > http3AD0BFD180D0B0D0B2D0B8D182D0B5D0BBD18CD181D182D0B2D0BED180D184search3Fphrase3DD0BDD0B0D0BBD0BED0B3 > > http3AD0BFD180D0B0D0B2D0B8D182D0B5D0BBD18CD181D182D0B2D0BED180D184search3Fphrase3DD0BFD180D0B5D0B7D0B8D0B4D0B5D0BDD182 > > > but this is not better that hash :-D > > > thanks > > > On Nov 3, 7:13 am, Robert Kluin <[email protected]> wrote: > >> Hey Dmitry, > >> I am sure the "fix" in that commit is _not_ a good idea. Originally > >> I stuck it in because I use entity keys as the task-name, sometimes > >> they contains characters not allowed in task-names. I actually > >> debated for several days about pushing that update out; finally I > >> decide to push and hope someone would notice and offer their thoughts. > > >> I like your idea a lot. But, for many aggregations I like to use > >> entity keys, it makes it possible for me to visually see what a task > >> is doing. What do you think about something like the following > >> approach: > > >> if not re.match("^[a-zA-Z0-9-]+$", task_name): > >> task_name = sha1_hash(task_name) > > >> That should allow 'valid' names to remain as-is, but it will safely > >> encode non-valid task-names. Do you think that is an acceptable > >> method? > > >> Thanks a lot for your feedback. > > >> Robert > > >> On Tue, Nov 2, 2010 at 07:15, Dmitry <[email protected]> wrote: > >>> Hi Robert, > > >>> Regarding your latest commit: > > >>> # TODO: find a better solution for cleaning up the name. > >>> task_name = re.sub('[^a-zA-Z0-9-]', '', task_name)[:500] > > >>> Don't think this is a good idea:) For example I have unicode > >>> characters in aggregation value. In this case regexp will return > >>> nothing. > >>> I use sha1 hash now... but there's also a little possibility of > >>> collision > > >>> sha1_hash(self.agg_name) > > >>> def utf8encoded(data): > >>> if data is None: > >>> return None > >>> if isinstance(data, unicode): > >>> return unicode(data).encode('utf-8') > >>> else: > >>> return data > > >>> def sha1_hash(value): > >>> return hashlib.sha1(utf8encoded(value)).hexdigest() > > >>> On Oct 24, 9:26 pm, Robert Kluin <[email protected]> wrote: > >>>> Hi Dmitry, > >>>> Glad to hear it was helpful! Not sure when you checked it out last, > >>>> but I made a number of good (I think) improvements in the last couple > >>>> days, such as continuations to allow splitting large groups of work > >>>> up. > > >>>> Robert > > >>>> On Sun, Oct 24, 2010 at 07:57, Dmitry <[email protected]> wrote: > >>>>> Robert, > > >>>>> You grouping_with_date_rollup.py example was extremely helpful. Thanks > >>>>> a lot again! :) > > >>>>> On Oct 14, 8:47 pm, Robert Kluin <[email protected]> wrote: > >>>>>> Hey Carles, > >>>>>> Glad it seems helpful. I am hoping to get time today to push out > >>>>>> some revisions and sample code. > > >>>>>> Robert > > >>>>>> On Thu, Oct 14, 2010 at 05:50, Carles Gonzalez <[email protected]> > >>>>>> wrote: > >>>>>>> Robert, I took a brief inspection at your code and seems very cool. > >>>>>>> Exactly > >>>>>>> what i was lloking for for my report generation and such. > >>>>>>> I'm looking forward for more examples, but it seems a very valuable > >>>>>>> addition > >>>>>>> for our toolbox. > >>>>>>> Thanks a lot! > > >>>>>>> On Wed, Oct 13, 2010 at 9:20 PM, Carles Gonzalez <[email protected]> > >>>>>>> wrote: > > >>>>>>>> Neat! I'm going to see this code, hopefully I'll understand > >>>>>>>> something :) > >>>>>>>> On Wednesday, October 13, 2010, Robert Kluin <[email protected]> > >>>>>>>> wrote: > >>>>>>>>> Hey Dmitry, > >>>>>>>>> In case it might help, I pushed some code to bitbucket. At the > >>>>>>>>> moment I would (personally) say the code is not too pretty, but it > >>>>>>>>> works well. :) > >>>>>>>>> http://bitbucket.org/thebobert/slagg > > >>>>>>>>> Sorry it does not really have good documentation at the moment, > >>>>>>>>> but > >>>>>>>>> I think the basic example I threw together will give you a good idea > >>>>>>>>> of how to use it. I need to do another cleanup pass over the API to > >>>>>>>>> make a few more refinements. > > >>>>>>>>> I pulled this code out of one of my apps, and tried to quickly > >>>>>>>>> refactor it to be a bit more generic. We are currently using > >>>>>>>>> basically the same code in three apps to do some really complex > >>>>>>>>> calculations. As soon as I get time I will get an example up > >>>>>>>>> showing > >>>>>>>>> how to use it for neat stuff, like overall, yearly, monthly, and > >>>>>>>>> daily > >>>>>>>>> aggregates across multiple values (like total dollars and quantity). > >>>>>>>>> The cool thing is that you can do all of those aggregations across > >>>>>>>>> various groupings, like customer, company, contact, and > >>>>>>>>> sales-person, > >>>>>>>>> at once. I'll get that code pushed out in the next few days. > > >>>>>>>>> Would love to get some feedback on it. > > >>>>>>>>> Robert > > >>>>>>>>> On Tue, Oct 12, 2010 at 17:26, Dmitry <[email protected]> > >>>>>>>>> wrote: > >>>>>>>>>> Ben, thanks for your code! I'm trying to understand all this stuff > >>>>>>>>>> too... > >>>>>>>>>> Robert, any success with your "library"? May be you've already done > >>>>>>>>>> all stuff we are trying to implement... > > >>>>>>>>>> p.s. where is Brett S.:) would like to hear his comments on this > > >>>>>>>>>> On Sep 21, 1:49 pm, Ben <[email protected]> wrote: > >>>>>>>>>>> Thanks for your insights. I would love feedback on this > >>>>>>>>>>> implementation > >>>>>>>>>>> (Brett S. suggested we send in our code for > >>>>>>>>>>> this)http://pastebin.com/3pUhFdk8 > > >>>>>>>>>>> This implementation is for just one materialized view row at a > >>>>>>>>>>> time > >>>>>>>>>>> (e.g. a simple counter, no presence markers). Hopefully putting > >>>>>>>>>>> an ETA > >>>>>>>>>>> on the transactional task will relieve the write pressure, since > >>>>>>>>>>> usually it should be an old update with an out-of-date sequence > >>>>>>>>>>> number > >>>>>>>>>>> and be discarded (the update having already been completed in > >>>>>>>>>>> batch by > >>>>>>>>>>> the fork-join-queue). > > >>>>>>>>>>> I'd love to generalize this to do more than one materialized view > >>>>>>>>>>> row > >>>>>>>>>>> but thought I'd get feedback first. > > >>>>>>>>>>> Thanks, > >>>>>>>>>>> Ben > > >>>>>>>>>>> On Sep 17, 7:30 am, Robert Kluin <[email protected]> wrote: > > >>>>>>>>>>>> Responses inline. > > >>>>>>>>>>>> On Thu, Sep 16, 2010 at 17:32, Ben <[email protected]> > >>>>>>>>>>>> wrote: > >>>>>>>>>>>>> I have a question about Brett Slatkin's talk at I/O 2010 on data > >>>>>>>>>>>>> pipelines. The question is about slide #67 of his pdf, > >>>>>>>>>>>>> corresponding > >>>>>>>>>>>>> to minute 51:30 of his talk > > >>>>>>>>>>>>>>http://code.google.com/events/io/2010/sessions/high-throughput-data-p... > > >>>>>>>>>>>>> I am wondering what is supposed to happen in the transactional > >>>>>>>>>>>>> task > >>>>>>>>>>>>> (bullet point 2c). Would these updates to the materialized view > >>>>>>>>>>>>> cause > >>>>>>>>>>>>> you to write too frequently to the entity group containing the > >>>>>>>>>>>>> materialized view? > > >>>>>>>>>>>> I think there are really two different approaches you can use to > >>>>>>>>>>>> insert your work models. > >>>>>>>>>>>> 1) The work models get added to the original entity's group. > >>>>>>>>>>>> So, > >>>>>>>>>>>> inside of the original transaction you do not write to the entity > >>>>>>>>>>>> group containing the materialized view -- so no contention on it. > >>>>>>>>>>>> Commit the transaction and proceed to step 3. > >>>>>>>>>>>> 2) You kick off a transactional task to insert the work model, > >>>>>>>>>>>> or > >>>>>>>>>>>> fan-out more tasks to create work models :). Then you proceed > >>>>>>>>>>>> to > >>>>>>>>>>>> step 3. > > >>>>>>>>>>>> You can use method 1 if you have only a few aggregates. If you > >>>>>>>>>>>> have > >>>>>>>>>>>> more aggregates use the second method. I have a "library" I am > >>>>>>>>>>>> almost > >>>>>>>>>>>> ready to open source that makes method 2 really easy, so you can > >>>>>>>>>>>> have > >>>>>>>>>>>> lots of aggregates. I'll post to this group when I release it. > > >>>>>>>>>>>>> And a related question, what happens if there is a failure just > >>>>>>>>>>>>> after > >>>>>>>>>>>>> the transaction in bullet #2, but right before the named task > >>>>>>>>>>>>> gets > >>>>>>>>>>>>> inserted in bullet #3. In my current implementation I just left > >>>>>>>>>>>>> out > >>>>>>>>>>>>> the transactional task (bullet point 2c) but I think that causes > >>>>>>>>>>>>> me to > >>>>>>>>>>>>> lose the eventual consistency. > > >>>>>>>>>>>> Failure between steps 2 and 3 just means _that_ particular update > >>>>>>>>>>>> will > >>>>>>>>>>>> not try to kick-off, ie insert, the fan-in (aggregation) task. > >>>>>>>>>>>> But > >>>>>>>>>>>> it > >>>>>>>>>>>> might have already been inserted by the previous update, or the > >>>>>>>>>>>> next > >>>>>>>>>>>> update. However, if nothing else kicks of the fan-in task you > >>>>>>>>>>>> will > >>>>>>>>>>>> need some periodic "cleanup" method to catch the update and kick > >>>>>>>>>>>> of > >>>>>>>>>>>> the fan-in task. Depending on exactly how you implemented step 2 > >>>>>>>>>>>> you > >>>>>>>>>>>> may not need a transactional task. > > >>>>>>>>>>>> Robert > > >>>>>>>>>>>>> Thanks! > > >>>>>>> -- > >>>>>>> You received this message because you are subscribed to the Google > >>>>>>> Groups > >>>>>>> "Google App Engine" group. > > ... > > read more >>
-- You received this message because you are subscribed to the Google Groups "Google App Engine" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.
