"Farhan Khan" <email@example.com> writes: > I am trying to write an implementation of git clone over ssh and > am a little confused how to determine a server response has > ended. Specifically, after a client sends its requested 'want', > the server sends the pack content over. However, how does the > client know to stop reading data? If I run a simple read() of the > file descriptor: > > A. If I use reading blocking, the client will wait until new data is > available, potentially forever. > B. If I use non-blocking, the client might terminate reading for new data, > when in reality new data is in transit.
It's TCP stream, so blocking read will tell you when the the other side finishes talking to you and disconnects. Your read() will signal EOF. If you are paranoid and want to protect your reader against malicious writer, then you cannot trust anything the other side says (including possibly any "I have N megabyte of data" kind of length information), so you'd need to set up a timeout to get yourself out of a stuck read, but that is neither a news nor a rocket surgery ;-) The "upload-pack" (the component that talks with your "fetch" and "clone"), after negotiating what objects to include in the data transfer with the program on your side, produces a pack data stream, and is allowed to send additional "garbage" after that. The receiving end, after finishing the negotiation, reads the pack data stream (there is only one packfile contents in it) and parses it according to the packfile format so that it can find the end (cf. Documentation/technical/pack-format.txt). After seeing the end of the pack stream, anything that follows is "garbage" and is generally passed through to the standard output. There are two codepaths on the receiving end ("unpack-objects" and "index-pack --stdin"). Most likely an initial "clone" would end up following the latter, but for educational purposes, the unpack-objects may be easier to follow. These two codepaths are morally equivalent at the higher conceptual levels.