> On 25 Aug 2016, at 20:46, Stefan Beller <sbel...@google.com> wrote:
>> On Thu, Aug 25, 2016 at 4:07 AM, <larsxschnei...@gmail.com> wrote:
>> From: Lars Schneider <larsxschnei...@gmail.com>
>> packet_write_stream_with_flush_from_fd() and
>> packet_write_stream_with_flush_from_buf() write a stream of packets. All
>> content packets use the maximal packet size except for the last one.
>> After the last content packet a `flush` control packet is written.
>> packet_read_till_flush() reads arbitrary sized packets until it detects
>> a `flush` packet.
> So the API provided by these read/write functions is intended
> to move a huge chunks of data. And as it puts the data on the wire one
> packet after the other without the possibility to intervene and e.g. send
> a side channel progress bar update, I would question the design of this.
> If I understand correctly this will be specifically used for large
> files locally,
> so e.g. a file of 5 GB (such as a virtual machine tracked in Git), would
> require about 80k packets.
Peff suggested this approach arguing that the overhead is neglectable:
> Instead of having many packets of max length and then a remainder,
> I would suggest to invent larger packets for this use case. Then we can
> just send one packet instead.
> Currently a packet consists of 4 bytes indicating the length in hex
> and then the payload of length-4 bytes. As the length is in hex
> the characters in the first 4 bytes are [0-9a-f], we can easily add another
> meaning for the length, e.g.:
> A packet starts with the overall length and then the payload.
> If the first character of the length is 'v' the length is encoded as a
> variable length quantity. The high bit of the char indicates if
> the next char is still part of the length field. The length must not exceed
> LLONG_MAX (which results in a payload of 9223 Petabyte, so
> enough for the foreseeable future).
Eventually I would like to resurrect Joey's cleanFromFile/smudgeToFile idea:
Then we would not need to transfer that much data over the pipes. However, I
wonder if the large amount of packets would actually be a problem. Honestly, I
would prefer to not change Git's packet format in this already large series ;-)
>  A variable-length quantity (VLQ) is a universal code that uses
> an arbitrary number of bytes to represent an arbitrarily large integer.
> The neat thing about the packet system is we can dedicate packets
> to different channels (such as the side channels), but with the provided
> API here this makes it impossible to later add in these side channel
> as it is a pure streaming API now. So let's remove the complication
> of having to send multiple packets and just go with one large packet
I tried to design the protocol as flexible as possible for the future with a
version negotiation and a capabilities list. Therefore, I would think it should
be possible to implement these ideas in the future if they are required.
> I understand that my proposal would require writing code again,
> but it has also some long term advantages in the networking stack
> of Git: There are some worries that a capabilities line in fetch/push
> might overflow in the far future, when there are lots of capabilities.
> Also a few days ago there was a proposal to add all symbolic refs
> to a capabilities line, which Peff shot down as "the packet may be
> too small".
> There is an incredible hack that allows transporting refs > 64kB IIRC.
> All these things could go away with the variable length encoded
> packets. But to make them go away in the future we would need
> to start with these variable length packets today. ;)
> Just food for thought.
Thanks for thinking it through that thoroughly! I understand your point of view
and I am curious what others thing.
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html