W dniu 2016-07-28 o 15:29, Jeff King pisze:
> On Thu, Jul 28, 2016 at 09:16:18AM +0200, Lars Schneider wrote:
>
>> But Peff ($gmane/299902), Duy, and Eric, seemed to prefer the pkt-line
>> solution (gmane is down - otherwise I would have given you the links).
>
> FWIW, I think there are arguments for transmitting size + content
> (namely, that it is simpler); the downside is that it doesn't allow
> streaming.
And that it requires for the filter to know the size of its output
upfront (which, as I wrote, might be easy to do based on size of input
and data stored elsewhere, or might need generating whole output to
know).
I don't know how parallel Git is, but if it is parallel enough,
and other limits do not apply (limited amount of CPU cores, I/O limits),
without streaming new filter protocol might be slower, unless startup
time dominates (MS Windows?):
Current parallel:
| startup | processing 1 |
| startup | processing 2 |
| startup | processing 3 |
| startup | processing 4 |
Protocol v2:
| startup | processing 1 | processing 2 | processing 3 | processing 4 |
>
> So I think there are two viable alternatives:
>
> 1. Total size of data in ASCII decimal, newline, then that many bytes
> of content.
>
> 2. No size header, then a series of pkt-lines followed by a flush
> packet.
3. Optional size header[2][3], then a series of pkt-lines followed
by a flush packet[4].
[2] Git should always provide size, because it is easy to do, and
I think quite cheap (stored with blob, stored in index, or stat()
on file away). Filter can provide size if it is easy to calculate,
or approximation of size / size hint[5] - it helps to avoid
reallocation.
[3] It is also a place where filter can pass error conditions that
are known before starting processing a file.
[4] On one hand you need to catch cases where real size is larger than
size sent upfront, or smaller than size sent upfront; on the
other hand it might be a place where to send warnings and errors...
unless we utilize stderr of a process (but then there is a problem
of deadlocking, I think).
[5] I suggest
<size as ascii decimal>
"approx" SPC <size as ascii decimal>
"unknown"
"fail"
> And you should choose between the two based on whether it's more
> important to allow streaming, or more important to make the filter
> implementations simple[1].
>
> Any solution that is in between those (like sending a size header and
> then using pktlines anyway) is sacrificing simplicity but not getting
> the streaming benefits.
>
> -Peff
>
> [1] I haven't thought hard enough about it to have a real opinion. My
> gut says to go with the streaming, just because we've had to
> retrofit streaming in other areas when dealing with blobs, so I
> think we'll end up there eventually. So choosing a simpler protocol
> like (1) would probably mean eventually implementing a next-version
> protocol that does (2), and having to support both.
>
> PS Jakub asked for links, but gmane is down. Here are the relevant threads:
>
> http://public-inbox.org/git/[email protected]
>
>
> http://public-inbox.org/git/20160722154900.19477-1-larsxschneider%40gmail.com/t/#u
>
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html