On 21 November 2013 00:25, Rasjid Wilcox <[email protected]> wrote:
> I can't really comment a lot, since I've avoided django after looking at > it a number of years ago and deciding that its ORM was too limited for my > needs. You may need to drop down to raw sql, rather than relying on the > django ORM. At the very least you will want to look at exactly what sql is > being sent to the server. See > https://docs.djangoproject.com/en/dev/faq/models/#how-can-i-see-the-raw-sql-queries-django-is-runningfor > info on that, although I don't know if that will tell you when each > transaction is starting and stopping. > You might find any limitations you once had with Django ORM no longer exist now. One limitation I have noticed though, is that above technique (and the other logging technique) for logging SQL don't log session start up / transaction stuff :-( This is Django 1.5 however, can't really report a bug until I have tested it with Django 1.6, a lot has changed in the db layer. So, I am restricted to using wireshark. Lets see: Process #1: SET NAMES utf8 set autocommit=0 SET SQL_AUTO_IS_NULL = 0 SELECT `machine`.`id`, `machine`.`password`, `machine`.`last_login`, `machine`.`name`, `machine`.`no_cpus`, `machine`.`no_nodes`, `machine`.`type`, `machine`.`category_id`, `machine`.`start_date`, `machine`.`end_date`, `machine`.`pbs_server_host`, `machine`.`mem_per_core`, `machine`.`scaling_factor` FROM `machine` WHERE `machine`.`id` = 1 Ping INSERT INTO `cache_machinecache` (`date`, `start`, `end`, `cpu_hours`, `no_jobs`, `machine_id`) VALUES ('2013-11-22', '2003-12-25', '2003-12-25', '10.00', 10, 1) [ 10 second sleep ] Process #2: SET NAMES utf8 set autocommit=0 SET SQL_AUTO_IS_NULL = 0 SELECT `machine`.`id`, `machine`.`password`, `machine`.`last_login`, `machine`.`name`, `machine`.`no_cpus`, `machine`.`no_nodes`, `machine`.`type`, `machine`.`category_id`, `machine`.`start_date`, `machine`.`end_date`, `machine`.`pbs_server_host`, `machine`.`mem_per_core`, `machine`.`scaling_factor` FROM `machine` WHERE `machine`.`id` = 1 Ping INSERT INTO `cache_machinecache` (`date`, `start`, `end`, `cpu_hours`, `no_jobs`, `machine_id`) VALUES ('2013-11-22', '2003-12-25', '2003-12-25', '10.00', 10, 1) [ doesn't return result yet ] 10 second delay Process #1: rollback Process #2 gets result rollback So I guess the magic line must be "set autocommit=0". Still not sure I understand how this locks other sessions however. However, looking at your code below, I'm guessing that the issue is you are > wrapping the entire process up in a single transaction. > That is the recommended practise - enable the middleware that wraps every web request in a transaction. Can be changed or a per method basis however. I think you would be better off with two steps or at least two transactions: > > My original post suggested having a new and completely separate table just > to do the guarding, and that may be the simpler way. > This is just a test to make sure I understand correctly what would happen. I don't want a web request to hang around for ages waiting for the cache to be filled. Which in turn requires Celery. So my latest attempt at the real code is: def gen_machine_cache(request, start, end): try: tc = TaskCache.objects.get(machine_category__isnull=True, date=datetime.date.today(), start=start, end=end) if tc.ready: return None result = Task.AsyncResult(tc.celery_task_id) if result.ready(): result.forget() tc.ready = True tc.save() return None except TaskCache.DoesNotExist: result = tasks.gen_machine_cache.delay(start, end) TaskCache.objects.create(machine_category=None, date=datetime.date.today(), start=start, end=end, celery_task_id=result.task_id) return render_to_response( 'usage/progress.html', { 'task_id': result.task_id }, context_instance=RequestContext(request)) As per your suggestion, am thinking of changing it to something like: def gen_machine_cache(request, start, end): try: tc = TaskCache.objects.create(machine_category=None, date=datetime.date.today(), start=start, end=end, celery_task_id="") result = tasks.gen_machine_cache.delay(start, end) tc.celery_task_id = result.task_id tc.save() except TaskCache.IntegrityError: tc = TaskCache.objects.get(machine_category=None, date=datetime.date.today(), start=start, end=end) if tc.ready: return None result = Task.AsyncResult(tc.celery_task_id) if result.ready(): result.forget() tc.ready = True tc.save() return None return render_to_response( 'usage/progress.html', { 'task_id': result.task_id }, context_instance=RequestContext(request)) The IntegrityError case doesn't cope if celery_task_id hasn't been initialized, however if my early tests are anything to go by, the 2nd call to create will not return until the first mysql session is complete, so celery_task_id should always be set at this point. With either approach, you want a transaction just around the insert, not > the whole data generation process. If this is not easy to code using > Django's ORM, then I would suggest going the separate 'guard table' route > and using raw sql (just for that table). > Not sure this is required. Actually, like I said, this stuff has changed in Django 1.6, so it might be better off testing Django 1.6 now. -- Brian May <[email protected]>
_______________________________________________ melbourne-pug mailing list [email protected] https://mail.python.org/mailman/listinfo/melbourne-pug
