write flush terminated packet streams

Lars Schneider Thu, 25 Aug 2016 16:32:06 -0700

> On 25 Aug 2016, at 20:46, Stefan Beller <[email protected]> wrote:
> 
>> On Thu, Aug 25, 2016 at 4:07 AM,  <[email protected]> wrote:
>> From: Lars Schneider <[email protected]>
>> 
>> packet_write_stream_with_flush_from_fd() and
>> packet_write_stream_with_flush_from_buf() write a stream of packets. All
>> content packets use the maximal packet size except for the last one.
>> After the last content packet a `flush` control packet is written.
>> 
>> packet_read_till_flush() reads arbitrary sized packets until it detects
>> a `flush` packet.
> 
> So the API provided by these read/write functions is intended
> to move a huge chunks of data. And as it puts the data on the wire one
> packet after the other without the possibility to intervene and e.g. send
> a side channel progress bar update, I would question the design of this.
> If I understand correctly this will be specifically  used for large
> files locally,
> so e.g. a file of 5 GB (such as a virtual machine tracked in Git), would
> require about 80k packets.


Peff suggested this approach arguing that the overhead is neglectable:
http://public-inbox.org/git/[email protected]/


> Instead of having many packets of max length and then a remainder,
> I would suggest to invent larger packets for this use case. Then we can
> just send one packet instead.
> 
> Currently a packet consists of 4 bytes indicating the length in hex
> and then the payload of length-4 bytes. As the length is in hex
> the characters in the first 4 bytes are [0-9a-f], we can easily add another
> meaning for the length, e.g.:
> 
>  A packet starts with the overall length and then the payload.
>  If the first character of the length is 'v' the length is encoded as a
>  variable length quantity[1]. The high bit of the char indicates if
>  the next char is still part of the length field. The length must not exceed
>  LLONG_MAX (which results in a payload of 9223 Petabyte, so
>  enough for the foreseeable future).

Eventually I would like to resurrect Joey's cleanFromFile/smudgeToFile idea:

http://public-inbox.org/git/[email protected]/

Then we would not need to transfer that much data over the pipes. However, I 
wonder if the large amount of packets would actually be a problem. Honestly, I 
would prefer to not change Git's packet format in this already large series ;-)


>  [1] A variable-length quantity (VLQ) is a universal code that uses
>  an arbitrary number of bytes to represent an arbitrarily large integer.
>  https://en.wikipedia.org/wiki/Variable-length_quantity
> 
> The neat thing about the packet system is we can dedicate packets
> to different channels (such as the side channels), but with the provided
> API here this makes it impossible to later add in these side channel
> as it is a pure streaming API now. So let's remove the complication
> of having to send multiple packets and just go with one large packet
> instead.

I tried to design the protocol as flexible as possible for the future with a 
version negotiation and a capabilities list. Therefore, I would think it should 
be possible to implement these ideas in the future if they are required.


> --
>    I understand that my proposal would require writing code again,
>    but it has also some long term advantages in the networking stack
>    of Git: There are some worries that a capabilities line in fetch/push
>    might overflow in the far future, when there are lots of capabilities.
> 
>    Also a few days ago there was a proposal to add all symbolic refs
>    to a capabilities line, which Peff shot down as "the packet may be
>    too small".
> 
>    There is an incredible hack that allows transporting refs > 64kB IIRC.
> 
>    All these things could go away with the variable length encoded
>    packets. But to make them go away in the future we would need
>    to start with these variable length packets today. ;)
> 
> Just food for thought.

Thanks for thinking it through that thoroughly! I understand your point of view 
and I am curious what others thing.

Cheers,
Lars

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v6 06/13] pkt-line: add functions to read/write flush terminated packet streams

Reply via email to