Re: [gentoo-portage-dev] [PATCH] sync repositories in parallel (bug 557426)

Zac Medico Thu, 13 Aug 2015 12:01:18 -0700

On 08/13/2015 07:29 AM, Brian Dolbec wrote:
> On Thu, 13 Aug 2015 01:34:36 -0700
> Zac Medico <zmed...@gentoo.org> wrote:
>> +    def _task_exit(self, task):
>> +            '''
>> +            Remove the task from the graph, in order to expose
>> +            more leaf nodes.
>> +            '''
>> +            self._running_tasks.discard(task)
>> +            returncode = task.returncode
>> +            if task.returncode == os.EX_OK:
>> +                    returncode, message, updatecache_flg =
>> task.result
>> +                    if message:
>> +                            self.msgs.append(message)
>> +            repo = task.kwargs['repo'].name
>> +            self._running_repos.remove(repo)
>> +            self.retvals.append((repo, returncode))
>> +            self._sync_graph.remove(repo)
>> +            self._update_leaf_nodes()
>> +            super(SyncScheduler, self)._task_exit(self)
>> +
> 
> 
> returncode in this function seems to be assigned too man times for no
> real reason.  Is there no other way to get message than to reassign
> returncode?  and it task.returncode is not os.EX_OK is not task.result
> still accessible?


No, it's not accessible, because this means that an unexpected exception
has occurred in the child process. In practice, this case should not
happen, but it could if there's a bug in any one of the sync modules.

This is why we have the odd looking returncode assignment, since we can
only access the real sync module returncode if it didn't throw an
unexpected exception.

> maybe there is a a message still?

There's no message available if the sync module raised an unexpected
exception. The AsyncFunction class will print a traceback in this case,
and exit with returncode 1.

> It sounds like
> message should be made available like returncode and forget about
> splitting task.result...  maybe it will be clearer when I fifnish
> reviewing the code. It is early in the morning ... still waking up

I've left the returncode and message handling as they were before the
patch. It's just that now they're passed back to the main process via a
pipe (handled by AsyncFunction), and we have to handle the case where
the child process might have raised an unexpected exception.

>> +    def _can_add_job(self):
>> +            '''
>> +            Returns False if there are no leaf nodes available.
>> +            '''
>> +            if not AsyncScheduler._can_add_job(self):
>> +                    return False
>> +            return bool(self._leaf_nodes) and not
>> self._terminated.is_set() +
>> +    def _keep_scheduling(self):
>> +            '''
>> +            Schedule as long as the graph is non-empty, and we
>> haven't
>> +            been terminated.
>> +            '''
>> +            return bool(self._sync_graph) and not
>> self._terminated.is_set()
> 
> 
> 
> These 2 functions look like barely different versions of the same
> thing.  doesn't  bool(self._leaf_nodes) and bool(self._sync_graph)
> really evaluate the same? if there are no leaf nodes, then sync graph
> should be false????

No, it's more complex than that, because this is a non-deterministic
scheduler like the one that we use for parallel builds. The number of
leaf nodes will be zero when a master repo such as 'gentoo' is syncing.
After the master finishes syncing, the master node is removed from the
graph, creating leaf nodes. Then the _update_leaf_nodes method is called
to update self._leaf_nodes.


>> +    def _sync_callback(self, proc):
>> +            """
>> +            This is called in the parent process, serially, for
>> each of the
>> +            sync jobs when they complete. Some cache backends
>> such as sqlite
>> +            may require that cache access be performed serially
>> in the
>> +            parent process like this.
>> +            """
>> +            repo = proc.kwargs['repo']
>> +            exitcode = proc.returncode
>> +            updatecache_flg = False
>> +            if proc.returncode == os.EX_OK:
>> +                    exitcode, message, updatecache_flg =
>> proc.result 
>> -    def _sync_callback(self, exitcode, updatecache_flg):
>>              if updatecache_flg and "metadata-transfer" not in
>> self.settings.features: updatecache_flg = False
>>  
>>              if updatecache_flg and \
>>                      os.path.exists(os.path.join(
>> -                    self.repo.location, 'metadata',
>> 'md5-cache')):
>> +                    repo.location, 'metadata', 'md5-cache')):
>>  
>>
> 
> 
> Again, this looks unnecessarily messy dealing with returncode/exitcode
> and updatecache_flag  assigning and reassigning...

The situation is analogous to the other one that you complained about.
The exitcode, message, and updatecache_flag variables are simply not
available if the child process has raised an unexpected exception. It
should not happen in practice, but it could if one of the sync modules
has a bug.
-- 
Thanks,
Zac

Re: [gentoo-portage-dev] [PATCH] sync repositories in parallel (bug 557426)

Reply via email to