On Fri, 2010-11-12 at 11:30 +0100, Sebastian Egner wrote: > On 11.11.10 18:03, Jelmer Vernooij wrote: > > On Thu, 2010-11-11 at 16:49 +0100, Sebastian Egner wrote: > >> On 11.11.10 15:20, Jelmer Vernooij wrote: > >>> We'd need an alternative way to determine whether there's data available > >>> from a subprocess. I'm not sure what the best way to do that is on > >>> Windows. The subprocess module doesn't provide any hints about this. > >> Your are right, of course. I checked: The object proc.stdout will be a > >> Python 'file' object on Windows---and these don't come with a method of > >> polling, unless the entire file is switched into asynchronous mode. > > Switching the file to some sort of non-blocking mode should be fine, if > > that gives you a way to poll whether there is data available. > Unfortuntately, there is no commonly known way to do this in Python on > Windows. > http://stackoverflow.com/questions/375427/non-blocking-read-on-a-stream-in-python > (Note that the 'fcntl' module is only available on the Unix platforms.) > > >> However, from the source code I got the impression that it is sufficient > >> to detect EOF because can_read() only checks if some bytes are waiting > >> to be read, so the subsequent read() can still block on the second byte > >> of the stream. This means, we can probably get away with a simple > >> read-ahead mechanism: can_read() does a blocking read(1) and stores it, > >> and the next Procotol.read(n) returns the stored byte appended with > >> read(n-1). > > This is not a file but a stream. Not being able to read from it does not > > necessarily imply EOF. Attempting to read a single byte will mean > > blocking until the other side provides that byte, while we could be > > spending that time crunching data. The sYour patch adds econd byte could > > indeed block as > > well, but because of the way the protocol works this will not be the > > case (a packet is always at least 4 bytes).
> (2) I had traced the actual calls to SubprocessWrapper.{read,can_read}
> in a real example. It turned out that can_read() is called very
> infrequently. Most of the time, Dulwich simply calls read(n) without
> checking can_read() first---so the read(n) also blocks in the Unix
> version. (Or am I missing something?)
How often can_read() is called depends on the local and remote
repository and how much they are out of sync, it depends on the speed of
the remote server and on the latency.
> > Can you please attach diffs instead? They're much easier to read and
> > review.
> Attached.
Thanks.
> Please tell me what you intend to do with my suggestion, or what more I
> can do to help. I am in the process of selecting a Git toolchain for me
> and my Windows dev team. Hg/Hg-Git looked promising until I found out
> that it neither supports https transport nor git+ssh.
I agree that this is a genuine bug, and I'd like to see it fixed.
However, this patch is a hack, and I think it should be fixed properly.
There must be some way to poll whether there's data available, even if
it would just be by having a separate thread do the reading.
Alternatively, we could use paramiko instead of talking with a local ssh
process using subprocess.
There's a bug report here, somebody else ran into the same issue:
https://bugs.launchpad.net/dulwich/+bug/670035
Cheers,
Jelmer
signature.asc
Description: This is a digitally signed message part
_______________________________________________ Mailing list: https://launchpad.net/~dulwich-users Post to : [email protected] Unsubscribe : https://launchpad.net/~dulwich-users More help : https://help.launchpad.net/ListHelp

