Re: [BUG] branch renamed to 'HEAD'

2017-02-26 Thread Luc Van Oostenryck
On Mon, Feb 27, 2017 at 11:43:46AM +0530, Karthik Nayak wrote:
> Hello,
> 
> Thanks for reporting, but I don't think it is a bug.
> 
> On Mon, Feb 27, 2017 at 10:22 AM, Luc Van Oostenryck
>  wrote:
> > Hi,
> >
> > I just discover something which very much seems a bug to me
> > while making an error in renaming a branch.
> > The scenario is the following:
> > - I have a branch named 'orig'
> > - I want to make some experimental changes on it:
> > $ git checkout -b temp orig
> > $ ... edit some files ...
> > $ ... make some tests & commits ...
> > - I'm happy with my changes, so I want to have my original
> >   branch to now points to the head of this temp branch
> >   but did it wrongly:
> > $ git branch -m -f orig @
> 
> Here you are using the '-m' flag, which is to rename a branch. So what
> you're essentially
> doing is:
> $ git branch -m -f orig HEAD
> Do note that this won't reset 'orig' to point to 'HEAD', rather this
> renames 'orig' to 'HEAD'.
> 
> What you actually want to do (to reset 'orig' to 'HEAD') is:
> $ git branch -f orig @
> This would make orig point to the current HEAD.

Sure. I said it the description that I made an error in the renaming.

What I consider as a bug is that '@', which stands for refs/HEAD,
have been interpreted as a branch named 'HEAD' and thus created a
reference refs/heads/HEAD.

Luc


BENEFIT

2017-02-26 Thread Mrs Julie Leach
You are a recipient to Mrs Julie Leach Donation of $3 million USD. 
Contact(julieleac...@gmail.com) for claims.


Re: [BUG] branch renamed to 'HEAD'

2017-02-26 Thread Karthik Nayak
Hello,

Thanks for reporting, but I don't think it is a bug.

On Mon, Feb 27, 2017 at 10:22 AM, Luc Van Oostenryck
 wrote:
> Hi,
>
> I just discover something which very much seems a bug to me
> while making an error in renaming a branch.
> The scenario is the following:
> - I have a branch named 'orig'
> - I want to make some experimental changes on it:
> $ git checkout -b temp orig
> $ ... edit some files ...
> $ ... make some tests & commits ...
> - I'm happy with my changes, so I want to have my original
>   branch to now points to the head of this temp branch
>   but did it wrongly:
> $ git branch -m -f orig @

Here you are using the '-m' flag, which is to rename a branch. So what
you're essentially
doing is:
$ git branch -m -f orig HEAD
Do note that this won't reset 'orig' to point to 'HEAD', rather this
renames 'orig' to 'HEAD'.

What you actually want to do (to reset 'orig' to 'HEAD') is:
$ git branch -f orig @
This would make orig point to the current HEAD.

-- 
Regards,
Karthik Nayak


[BUG] branch renamed to 'HEAD'

2017-02-26 Thread Luc Van Oostenryck
Hi,

I just discover something which very much seems a bug to me
while making an error in renaming a branch.
The scenario is the following:
- I have a branch named 'orig'
- I want to make some experimental changes on it:
$ git checkout -b temp orig
$ ... edit some files ...
$ ... make some tests & commits ...
- I'm happy with my changes, so I want to have my original
  branch to now points to the head of this temp branch
  but did it wrongly:
$ git branch -m -f orig @
- Now I discover that I don't have anymore a branch named 'orig'
  That's fine, I made an error.
- I'm searching what had happened and discover the name my branch 
  have been renamed to: 'HEAD'
  In others words I have now an entry .git/refs/heads/HEAD
  which points to where my original branch pointed.

In my opinion, it's a bug that '@' have been expanded/resolved
into a branch named 'HEAD'.


Luc Van Oostenryck


Re: [PATCH] travis-ci: run scan-build every time

2017-02-26 Thread Samuel Lijin
On Sun, Feb 26, 2017 at 8:12 AM, Lars Schneider
 wrote:
>
>> On 26 Feb 2017, at 03:09, Samuel Lijin  wrote:
>>
>> On Sat, Feb 25, 2017 at 3:48 PM, Lars Schneider
>>  wrote:
>>>
 On 24 Feb 2017, at 18:29, Samuel Lijin  wrote:

 It's worth noting that there seems to be a weird issue with scan-build
 where it *will* generate a report for something locally, but won't do it
 on Travis. See [2] for an example where I have a C program with a
 very obvious memory leak but scan-build on Travis doesn't generate
 a report (despite complaining about it in stdout), even though it does
 on my local machine.

 [1] https://travis-ci.org/sxlijin/git/builds/204853233
 [2] https://travis-ci.org/sxlijin/travis-testing/jobs/205025319#L331-L342
>>>
>>> Scan-build stores the report in some temp folder. I assume you can't access
>>> this folder on TravisCI. Try the scan-build option "-o scan-build-results"
>>> to store the report in the local directory.
>>
>> That occurred to me, but I don't quite think that's the issue. I just
>> noticed that on the repo I use to test build matrices, jobs 1-8 don't
>> generate a report, but 9-14 and 19-20 do [1]. I don't think it's an
>> issue with write permissions (scan-build complains much more vocally
>> if that happens), but it doesn't seem to matter if the output dir is
>> in the tmpfs [2] or a local directory [3].
>>
>> [1] https://travis-ci.org/sxlijin/travis-testing/builds/205054253
>> [2] https://travis-ci.org/sxlijin/git/jobs/205028920#L1000
>> [2] https://travis-ci.org/sxlijin/git/jobs/205411705#L998
>
> Scan-build somehow replaces the compiler. My guess is that you
> tell scan-build to substitute clang but "make" is really using
> gcc or something?

Your hunch is spot-on. I took a look at the Makefile and lo and
behold, it overrides $CC [1]. Looking at the commit which introduced
it [2] I have to admit I'm somewhat surprised that scan-build works at
all...

[1] https://github.com/git/git/blob/master/Makefile#L454
[2] https://github.com/git/git/commit/6d62c983f7d91565a15e49955b3ed94ae7c73434

> I reported something strange about the compilers
> on TravisCI some time ago but I can't find it anymore. I think I
> remember on OSX they always use clang even if you define gcc.
> Maybe it makes sense to reach out to TravisCI support in case
> this is a bug on their end?
>
> Based on your work I tried the following and it seems to work:
> https://travis-ci.org/larsxschneider/git/jobs/205507241
> https://github.com/larsxschneider/git/commit/faf4ecfdca1a732459c1f93c334928ee2826d490

That's promising!

> - Lars


Re: SHA1 collisions found

2017-02-26 Thread Jeff King
On Sun, Feb 26, 2017 at 10:38:35PM +0100, Ævar Arnfjörð Bjarmason wrote:

> On Sun, Feb 26, 2017 at 8:11 PM, Linus Torvalds
>  wrote:
> > But yes, SHA3-256 looks like the sane choice. Performance of hashing
> > is important in the sense that it shouldn't _suck_, but is largely
> > secondary. All my profiles on real loads (well, *my* real loads) have
> > shown that zlib performance is actually much more important than SHA1.
> 
> What's the zlib v.s. hash ratio on those profiles? If git is switching
> to another hashing function given the developments in faster
> compression algorithms (gzip v.s. snappy v.s. zstd v.s. lz4)[1] we'll
> probably switch to another compression algorithm sooner than later.
> 
> Would compression still be the bottleneck by far with zstd, how about with 
> lz4?
> 
> 1. 
> https://code.facebook.com/posts/1658392934479273/smaller-and-faster-data-compression-with-zstandard/

zstd does help in normal operations that access lots of blobs. Here are
some timings:

  
http://public-inbox.org/git/20161023080552.lma2v6zxmyaii...@sigill.intra.peff.net/

Compression is part of the on-the-wire packfile format, so it introduces
compatibility headaches. Unlike the hash, it _can_ be a local thing
negotiated between the two ends, and a server with zstd data could
convert on-the-fly to zlib. You just wouldn't want to do so on a server
because it's really expensive (or you double your cache footprint to
store both).

If there were a hash flag day, we _could_ make sure all post-flag-day
implementations have zstd, and just start using that (it transparently
handles old zlib data, too). I'm just hesitant to through in the kitchen
sink and make the hash transition harder than it already is.

Hash performance doesn't matter much for normal read operations. If your
implementation is really _slow_ it does matter for a few operations
(notably index-pack receiving a large push or fetch). Some timings:

  
http://public-inbox.org/git/20170223230621.43anex65ndoqb...@sigill.intra.peff.net/

If the new algorithm is faster than SHA-1, that might be measurable in
those operations, too, but obviously less dramatic, as hashing is just a
percentage of the total operation (so it can balloon the time if it's
slow, but optimizing it can only save so much).

I don't know if the per-hash setup cost of any of the new algorithms is
higher than SHA-1. We care as much about hashing lots of small content
as we do about sustained throughput of a single hash.

-Peff


Re: SHA1 collisions found

2017-02-26 Thread Ævar Arnfjörð Bjarmason
On Sun, Feb 26, 2017 at 8:11 PM, Linus Torvalds
 wrote:
> But yes, SHA3-256 looks like the sane choice. Performance of hashing
> is important in the sense that it shouldn't _suck_, but is largely
> secondary. All my profiles on real loads (well, *my* real loads) have
> shown that zlib performance is actually much more important than SHA1.

What's the zlib v.s. hash ratio on those profiles? If git is switching
to another hashing function given the developments in faster
compression algorithms (gzip v.s. snappy v.s. zstd v.s. lz4)[1] we'll
probably switch to another compression algorithm sooner than later.

Would compression still be the bottleneck by far with zstd, how about with lz4?

1. 
https://code.facebook.com/posts/1658392934479273/smaller-and-faster-data-compression-with-zstandard/


Re: SHA1 collisions found

2017-02-26 Thread Jeff King
On Sun, Feb 26, 2017 at 07:57:19PM +0100, Thomas Braun wrote:

> While reading about the subject I came across [1]. The author reduced
> the hash size to 4bits and then played around with git.
> 
> Diff taken from the posting (not my code)
> --- git-2.7.0~rc0+next.20151210.orig/block-sha1/sha1.c
> +++ git-2.7.0~rc0+next.20151210/block-sha1/sha1.c
> @@ -246,6 +246,8 @@ void blk_SHA1_Final(unsigned char hashou
> blk_SHA1_Update(ctx, padlen, 8);
> 
> /* Output hash */
> -   for (i = 0; i < 5; i++)
> -   put_be32(hashout + i * 4, ctx->H[i]);
> +   for (i = 0; i < 1; i++)
> +   put_be32(hashout + i * 4, (ctx->H[i] & 0xf00));
> +   for (i = 1; i < 5; i++)
> +   put_be32(hashout + i * 4, 0);
>  }

Yeah, that is a lot more flexible for experimenting. Though I'd think
you'd probably want more than 4 bits just to avoid accidental
collisions. Something like 24 bits gives you some breathing space (you'd
expect a random collision after 4096 objects), but it's still easy to
do a preimage attack if you need to.

-Peff


Re: [PATCH v2] send-email: only allow one address per body tag

2017-02-26 Thread Matthieu Moy
Junio C Hamano  writes:

> Matthieu Moy  writes:
>
>> Johan Hovold  writes:
>>
>>> --- a/git-send-email.perl
>>> +++ b/git-send-email.perl
>>> @@ -1563,7 +1563,7 @@ foreach my $t (@files) {
>>> # Now parse the message body
>>> while(<$fh>) {
>>> $message .=  $_;
>>> -   if (/^(Signed-off-by|Cc): (.*)$/i) {
>>> +   if (/^(Signed-off-by|Cc): ([^>]*>?)/i) {
>>
>> I think this is acceptable, but this doesn't work with trailers like
>>
>> Cc: "Some > Body" 
>>
>> A proper management of this kind of weird address should be doable by
>> reusing the regexp parsing "..." in parse_mailbox:
>>
>>  my $re_quote = qr/"(?:[^\"\\]|\\.)*"/;
>>
>> So the final regex would look like
>>
>> if (/^(Signed-off-by|Cc): (([^>]*|"(?:[^\"\\]|\\.)*")>?)/i) {
>>
>> I don't think that should block the patch inclusion, but it may be worth
>> considering.
>>
>> Anyway, thanks for the patch!
>
> Somehow this fell off the radar.  So your reviewed-by: and then
> we'll cook this in 'next' for a while?

OK.

-- 
Matthieu Moy
http://www-verimag.imag.fr/~moy/


Re: SHA1 collisions found

2017-02-26 Thread Linus Torvalds
On Sun, Feb 26, 2017 at 9:38 AM, brian m. carlson
 wrote:
>
> SHA-256:
>   Common, but cryptanalysis has advanced.  Preimage resistance (which is
>   even more important than collision resistance) has gotten to 52 of 64
>   rounds.  Pseudo-collision attacks are possible against 46 of 64
>   rounds.  Slowest option.
> SHA-3-256:
>   Less common, but has a wide security margin.  Cryptanalysis is
>   ongoing, but has not advanced much.  Somewhat to much faster than
>   SHA-256, unless you have SHA-256 hardware acceleration (which almost
>   nobody does).
> BLAKE2b-256:
>   Lower security margin, but extremely fast (faster than SHA-1 and even
>   MD5).
>
> My recommendation has been for SHA-3-256, because I think it provides
> the best tradeoff between security and performance.

I initially was leaning towards SHA256 because of hw acceleration, but
noticed that the Intel SHA NI instructions that they've talking about
so long don't seem to actually exist anywhere (maybe the Goldmont
Atoms?)

So SHA256 acceleration is mainly an ARM thing, and nobody develops on
ARM because there's effectively no hardware that is suitable for
developers. Even ARM people just use PCs (and they won't be Goldmont
Atoms).

Reduced-round SHA256 may have been broken, but on the other hand it's
been around for a lot longer too, so ...

But yes, SHA3-256 looks like the sane choice. Performance of hashing
is important in the sense that it shouldn't _suck_, but is largely
secondary. All my profiles on real loads (well, *my* real loads) have
shown that zlib performance is actually much more important than SHA1.

Anyway, I don't think we should make the hash choice based on pure
performance concerns - crypto strength first, assuming performance is
"not horrible". SHA3-256 does sound like the best choice.

And no, we should not make extensibility a primary concern. It is
likely that supporting two hashes will make it easier to support three
in the future, but I do not think those kinds of worries should even
be on the radar.

It's *much* more important that we don't waste memory and CPU cycles
on being overly "generic" than some theoretical "but but maybe in
another fifteen years.."

  Linus


Re: SHA1 collisions found

2017-02-26 Thread Thomas Braun
Am 25.02.2017 um 00:06 schrieb Jeff King:
> So we don't actually know how Git would behave in the face of a SHA-1
> collision. It would be pretty easy to simulate it with something like:
>
> ---
> diff --git a/block-sha1/sha1.c b/block-sha1/sha1.c
> index 22b125cf8..1be5b5ba3 100644
> --- a/block-sha1/sha1.c
> +++ b/block-sha1/sha1.c
> @@ -231,6 +231,16 @@ void blk_SHA1_Update(blk_SHA_CTX *ctx, const void *data, 
> unsigned long len)
>   memcpy(ctx->W, data, len);
>  }
>  
> +/* sha1 of blobs containing "foo\n" and "bar\n" */
> +static const unsigned char foo_sha1[] = {
> + 0x25, 0x7c, 0xc5, 0x64, 0x2c, 0xb1, 0xa0, 0x54, 0xf0, 0x8c,
> + 0xc8, 0x3f, 0x2d, 0x94, 0x3e, 0x56, 0xfd, 0x3e, 0xbe, 0x99
> +};
> +static const unsigned char bar_sha1[] = {
> + 0x57, 0x16, 0xca, 0x59, 0x87, 0xcb, 0xf9, 0x7d, 0x6b, 0xb5,
> + 0x49, 0x20, 0xbe, 0xa6, 0xad, 0xde, 0x24, 0x2d, 0x87, 0xe6
> +};
> +
>  void blk_SHA1_Final(unsigned char hashout[20], blk_SHA_CTX *ctx)
>  {
>   static const unsigned char pad[64] = { 0x80 };
> @@ -248,4 +258,8 @@ void blk_SHA1_Final(unsigned char hashout[20], 
> blk_SHA_CTX *ctx)
>   /* Output hash */
>   for (i = 0; i < 5; i++)
>   put_be32(hashout + i * 4, ctx->H[i]);
> +
> + /* pretend "foo" and "bar" collide */
> + if (!memcmp(hashout, bar_sha1, 20))
> + memcpy(hashout, foo_sha1, 20);
>  }

While reading about the subject I came across [1]. The author reduced
the hash size to 4bits and then played around with git.

Diff taken from the posting (not my code)
--- git-2.7.0~rc0+next.20151210.orig/block-sha1/sha1.c
+++ git-2.7.0~rc0+next.20151210/block-sha1/sha1.c
@@ -246,6 +246,8 @@ void blk_SHA1_Final(unsigned char hashou
blk_SHA1_Update(ctx, padlen, 8);

/* Output hash */
-   for (i = 0; i < 5; i++)
-   put_be32(hashout + i * 4, ctx->H[i]);
+   for (i = 0; i < 1; i++)
+   put_be32(hashout + i * 4, (ctx->H[i] & 0xf00));
+   for (i = 1; i < 5; i++)
+   put_be32(hashout + i * 4, 0);
 }

>From a noob git-dev perspective this sounds more flexibel.

[1]: http://stackoverflow.com/a/34599081


Re: SHA1 collisions found

2017-02-26 Thread Junio C Hamano
Jeff King  writes:

> Trees are more difficult, as they don't have any such field. But a valid
> tree does need to start with a mode, so sticking some non-numeric flag
> at the front of the object would work (it breaks backwards
> compatibility, but that's kind of the point).

Just like the object header format does not inherently impose a
maximum length the system can handle on our objects or the number of
mode bits we can use in an entry in the tree object [*1*], the
format in which tags and commits refer to other objects does not
impose what hash is used for these references [*2*].  

The object names in the tree format is an oddball; by being a binary
20-byte field and without any other hint, it does limit us to stick
to SHA-1.

I think the helper functions in tree-walk.h, namely 

init_tree_desc();
tree_entry_extract();
update_tree_entry();

and the associated data structures can be updated to read a tree
object in a new format without affecting the readers too much.  By
having a "I am in a new format" byte at the beginning that cannot be
a valid first byte in the current tree format (non-octal is a good
thing to use here), init_tree_desc() can set things up in the desc
structure to expect that the data that will be read by
tree_entry_extract() and update_tree_entry() are formatted in a new
way, and by varying that "tree-format signature" byte, we can update
the format in the future.

So at the loose-object format level, we may not even need "tree2";
we can view this update in a way similar to the change we did when
we started supporting submodules/gitlinks.  Older Git would have
said "There is an object that is not tree or blob recorded" and
barfed but newer one takes such a tree just fine.  This "we are now
introducing a new hash, and a tree can either have objects all named
by SHA-1 or all new (non SHA-1) hash" update can be treated the same
way, methinks.

The normal flow to write tree objects is (supposed to be) all
contained in cache-tree.c.  As long as we can tell from "struct
object" which hash names the object (i.e. struct object_id may
become an enum and a union), we should be able to use it to convert
objects near the tip of the existing history to new hashes
incrementally. Ideally, the flag-day for one tip of a dag may be
just a matter of

git commit --allow-empty -m "object name hash update"

without anything else.  The commit by default would want to name
itself with the new hash, which requires it to get its tree named
with the new hash, which may read the old tree and associated blobs
all named with SHA-1, but write_index_as_tree() should be able to
(1) read the tree with its SHA-1 name to learn what is contained;
(2) read the contents of blobs with their SHA-1 names, and compute
their names with the new hash; and (3) write out a containing tree
object in the updated format and named with the new hash.  And that
would give us the tree object named with the new hash that the
command can write into the new commit object on its "tree" line.


[Footnote]

*1* These lengths and mode bits are spelled out in ASCII without any
fixed length limit for the number of the bytes in this ASCII
string that represents the length.  The current code may happen
to read them into unsigned long and unsigned int, which does
impose limit on the individual reader in the sense that if your
ulong is only 32-bit, you cannot have an object larger than 4GB.
But that is not an inherent limit in the format; you can lift it
by upgrading the reader.

*2* They are also spelled out in ASCII and there is no length limit.
Existing implementation may happen to assume that they are all
SHA-1, but the readers and the writers can be updated to allow
other hashes to be used in a way that does not break existing
code when we are only using SHA-1 by marking a reference that
uses new hash distinguishable from SHA-1 references.


[PATCH v2] convert: add "status=delayed" to filter process protocol

2017-02-26 Thread Lars Schneider
Some `clean` / `smudge` filters might require a significant amount of
time to process a single blob. During this process the Git checkout
operation is blocked and Git needs to wait until the filter is done to
continue with the checkout.

Teach the filter process protocol (introduced in edcc858) to accept the
status "delayed" as response to a filter request. Upon this response Git
continues with the checkout operation and asks the filter to process the
blob again after all other blobs have been processed.

Git has a multiple code paths that checkout a blob. Support delayed
checkouts only in `clone` (in unpack-trees.c) and `checkout` operations.

Signed-off-by: Lars Schneider 
---

Hi,

in v1 Junio criticized the "convert.h" interface of this patch [1].
After talking to Peff I think I understand Junio's point and I would
like to get your feedback on the new approach here. Please ignore all
changes behind async_convert_to_working_tree() and async_filter_finish()
for now as I plan to change the implementation as soon as the interface
is in an acceptable state.

The new interface also addresses Torsten's feedback and leaves
convert_to_working_tree() as is [2].

I also use '>' for numeric comparisons in Perl as suggested by Eric [3].

Please note, I rebased the patch to v2.12 as v1 did not apply clean on
master anymore.

Thanks,
Lars

[1] http://public-inbox.org/git/xmqqa8b115ll@gitster.mtv.corp.google.com/
[2] http://public-inbox.org/git/20170108201415.GA3569@tb-raspi/
[3] http://public-inbox.org/git/20170108204517.GA13779@starla/


RFC: http://public-inbox.org/git/d10f7c47-14e8-465b-8b7a-a09a1b28a...@gmail.com/
 v1: 
http://public-inbox.org/git/20170108191736.47359-1-larsxschnei...@gmail.com/


Notes:
Base Ref: v2.12.0
Web-Diff: https://github.com/larsxschneider/git/commit/13d5b37021
Checkout: git fetch https://github.com/larsxschneider/git 
filter-process/delay-v2 && git checkout 13d5b37021

 Documentation/gitattributes.txt |  9 ++
 builtin/checkout.c  |  1 +
 cache.h |  1 +
 convert.c   | 68 +
 convert.h   | 13 
 entry.c | 29 +++---
 t/t0021-conversion.sh   | 53 
 t/t0021/rot13-filter.pl | 19 
 unpack-trees.c  |  1 +
 9 files changed, 176 insertions(+), 18 deletions(-)

diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
index e0b66c1220..f6bad8db40 100644
--- a/Documentation/gitattributes.txt
+++ b/Documentation/gitattributes.txt
@@ -473,6 +473,15 @@ packet:  git<   # empty content!
 packet:  git<   # empty list, keep "status=success" unchanged!
 

+If the request cannot be fulfilled within a reasonable amount of time
+then the filter can respond with a "delayed" status and a flush packet.
+Git will perform the same request at a later point in time, again. The
+filter can delay a response multiple times for a single request.
+
+packet:  git< status=delayed
+packet:  git< 
+
+
 In case the filter cannot or does not want to process the content,
 it is expected to respond with an "error" status.
 
diff --git a/builtin/checkout.c b/builtin/checkout.c
index f174f50303..742e8742cd 100644
--- a/builtin/checkout.c
+++ b/builtin/checkout.c
@@ -369,6 +369,7 @@ static int checkout_paths(const struct checkout_opts *opts,
pos = skip_same_name(ce, pos) - 1;
}
}
+   errs |= checkout_delayed_entries();

if (write_locked_index(_index, lock_file, COMMIT_LOCK))
die(_("unable to write new index file"));
diff --git a/cache.h b/cache.h
index 61fc86e6d7..66dde99a79 100644
--- a/cache.h
+++ b/cache.h
@@ -1434,6 +1434,7 @@ struct checkout {

 #define TEMPORARY_FILENAME_LENGTH 25
 extern int checkout_entry(struct cache_entry *ce, const struct checkout 
*state, char *topath);
+extern int checkout_delayed_entries(const struct checkout *state);

 struct cache_def {
struct strbuf path;
diff --git a/convert.c b/convert.c
index 4e17e45ed2..24d29f5c53 100644
--- a/convert.c
+++ b/convert.c
@@ -4,6 +4,7 @@
 #include "quote.h"
 #include "sigchain.h"
 #include "pkt-line.h"
+#include "list.h"

 /*
  * convert.c - convert a file when checking it out and checking it in.
@@ -38,6 +39,13 @@ struct text_stat {
unsigned printable, nonprintable;
 };

+static LIST_HEAD(delayed_item_queue_head);
+
+struct delayed_item {
+   void* item;
+   struct list_head node;
+};
+
 static void gather_stats(const char *buf, unsigned long size, struct text_stat 
*stats)
 {
unsigned long i;
@@ -672,7 +680,7 @@ static struct cmd2process *start_multi_file_filter(struct 
hashmap *hashmap, cons
 }

 static int 

Re: SHA1 collisions found

2017-02-26 Thread brian m. carlson
On Sun, Feb 26, 2017 at 12:18:34AM -0500, Jeff King wrote:
> On Sun, Feb 26, 2017 at 01:13:59AM +, Jason Cooper wrote:
> 
> > On Fri, Feb 24, 2017 at 10:10:01PM -0800, Junio C Hamano wrote:
> > > I was thinking we would need mixed mode support for smoother
> > > transition, but it now seems to me that the approach to stratify the
> > > history into old and new is workable.
> > 
> > As someone looking to deploy (and having previously deployed) git in
> > unconventional roles, I'd like to add one caveat.  The flag day in the
> > history is great, but I'd like to be able to confirm the integrity of
> > the old history.
> > 
> > "Counter-hashing" the blobs is easy enough, but the trees, commits and
> > tags would need to have, iiuc, some sort of cross-reference.  As in my
> > previous example, "git tag -v v3.16" also checks the counter hash to
> > further verify the integrity of the history (yes, it *really* needs to
> > check all of the old hashes, but I'd like to make sure I can do step one
> > first).
> > 
> > Would there be opposition to counter-hashing the old commits at the flag
> > day?
> 
> I don't think a counter-hash needs to be embedded into the git objects
> themselves. If the "modern" repo format stores everything primarily as
> sha-256, say, it will probably need to maintain a (local) mapping table
> of sha1/sha256 equivalence. That table can be generated at any time from
> the object data (though I suspect we'll keep it up to date as objects
> enter the repository).

I really like this look-aside approach.  I think it makes it really easy
to just rewrite the history internally, but still be able to verify
signed commits and signed tags.  We could even synthesize the blobs and
trees from the new hash versions if we didn't want to store them.

This essentially avoids the need for handling competing hashes in the
same object (and controversy about multihash or other storage
facilities); just specify the new hash in the objects, and look up the
old one in the database if necessary.

This also will be the easiest approach to implement, IMHO.
-- 
brian m. carlson / brian with sandals: Houston, Texas, US
+1 832 623 2791 | https://www.crustytoothpaste.net/~bmc | My opinion only
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature


Re: SHA1 collisions found

2017-02-26 Thread Junio C Hamano
Jeff King  writes:

> On Sat, Feb 25, 2017 at 11:35:27PM +0100, Lars Schneider wrote:
> ...
>> That's a good idea! I wonder if it would make sense to setup an 
>> additional job in TravisCI that patches every Git version with some hash 
>> collisions and then runs special tests.
>
> I think it would be interesting to see the results under various
> scenarios. I don't know that it would be all that interesting from an
> ongoing CI perspective.

I had the same thought.  

I view such a test as a very good validation while we are finishing
up the introduction of new hash and the update to the codepaths that
need to handle both hashes, so I'd expect such a test to be a good
validation measure.  But once that work is concluded, I do not know
if tests in ongoing basis is all that interesting.


git-sha-x: idea for stronger cryptographic verification while keeping SHA1 content addresses

2017-02-26 Thread Steffen Prohaska
Hi,

Related to shattered, the recent discussion,
, the
past
,
and Linus's post ,
the idea below might be interesting.

I skimmed through the discussion but haven't read all the details.  I also
haven't been following the Git list during the last years, so it might very
well be that others have described similar ideas and the general approach has
been reject for some reason that I'm not aware of.

git-sha-x illustrates a potential solution for stronger cryptographic content
verification in Git while keeping SHA1 content pointers.

git-sha-x is available at .

git-sha-x computes a hash tree similar to the SHA1-based Git history but using
a different hash function.  The hashes can be added to the commit message and
signed with GPG to confirm the tree and the entire history with a cryptographic
strength that no longer depends on SHA1 but on the chosen git-sha-x hash and
the GPG configuration.  See `git-sha-x --help`.  Examples:

```
git-sha-x commit HEAD
git-sha-x tree HEAD
git-sha-x --sha512 commit HEAD

git-sha-x amend && git-sha-x --sha512 amend && git commit -S --amend -C HEAD
```

git-sha-x is only a proof of concept to illustrate the idea.  I do not intend
to develop it further.

If a similar approach was chosen for Git, the hashes could be managed by Git
and somehow represented in its object store as supplementary information.  Git
could incrementally compute additional hashes while it constructs history and
verify them when transferring data.

The strength of bare SHA1 ids is obviously not increased.  The strength is only
increased if the additional hashes are communicated in a verifiable way, too.
GPG signatures are one way.  Another way could be to communicate them via
a secure channel and pass them to git fetch for verification.  Assuming such an
implementation, a fetch for a commit from this repo could look like:

```bash
git fetch origin \
--sha256=8a3c72de658a4797e36bb29fc3bdc2a2863c04455a1b394ed9331f11f65ba802 \

--sha512=729de81500ce4ad70716d7641a115bd0a67984acc4d674044b25850e36d940bf631f9f6aa88768743690545ac899888fb54f65840f84853f9a8811aeb9ca
 \
ef2a4b7d216ab79630b9cd17e072a86e57f044fa
```

For practical purposes, supplementary hashes in the commit in combination with
GPG signatures should provide sufficient protection against attackers that try
to manipulate SHA1s.  For convenience, supplementary hashes could be stored in
the commit header, similar to `gpgsig`.  A hypothetical commit object could
look like:

```
tree 365c7e42fd004a1778c6d79c0437f970397a59b8
parent c2bfff12099b32425a3bcc4d0c7e6e6a169392d8
tree-sha256 2f588b9308b5203212d646fb56201608449cb4d83a5ffd6b7e6213d175a8077c
parent-sha256 090d9a3e69aa3369efac968abde859a6e42d05b631ece6d533765a35e998336c
tree-sha512 
12ae91b23733d52fa2f42b8f0bb5aeaeb111335688f387614c3b108a8cb86fa0e2cd6d19bf050f8a9308f8c1e991080507c91df53e0fc4cace3f746ec89a789a
parent-sha512 
d319889a40cf945d8c61dbe6d816e10badd49845c547df85ace4327676275eeb5ba2cd962712ddbb8f08f2db17dbc9eb46b59b5f7b7a7e05eab7df0ef89dec65
author Steffen Prohaska  1488122961 +0100
committer Steffen Prohaska  1488123452 +0100
gpgsig ...
```

GPG signatures would automatically cover the supplementary hashes.
Verification code paths would have to be added to compute the hashes from the
content to confirm that it has not been tampered with.

Since content verification would become independent from the content address,
the interpretation of the content address could be changed in the future.  The
size of 160 bits could be kept for simplicity.  But the meaning could be
changed.  For example, the first 160 bits of SHA256 could be uses as the
content address.  The remaining bits could be stored in an object supplement.
Verification code paths would combine the content address with the additional
bits to verify the SHA256.  Content pointers would keep their size.  Only the
additional SHA256 bits would be stored and used for verification.

Steffen



signature.asc
Description: Message signed with OpenPGP


Re: SHA1 collisions found

2017-02-26 Thread brian m. carlson
On Sun, Feb 26, 2017 at 12:16:07AM +, Jason Cooper wrote:
> Hi,
> 
> On Sat, Feb 25, 2017 at 01:31:32AM +0100, ankostis wrote:
> > That is why I believe that some HASH (e.g. SHA-3) must be the blessed one.
> > All git >= 3.x.x must support at least this one (for naming and
> > cross-referencing between objects).
> 
> I would stress caution here.  SHA3 has survived the NIST competition,
> but that's about it.  It has *not* received nearly as much scrutiny as
> SHA2.
> 
> SHA2 is a similar construction to SHA1 (Merkle–Damgård [1]) so it makes
> sense to be leery of it, but I would argue it's seasoning merits serious
> consideration.
> 
> Ideally, bless SHA2-384 (minimum) as the next hash.  Five or so years
> down the road, if SHA3 is still in good standing, bless it as the next
> hash.

I don't think we want to be changing hashes that frequently.  Projects
frequently last longer than five years.  I think using a 256-bit hash is
the right choice because it fits on an 80-column screen in hex format.
384-bit hashes do not.  This matters because line wrapping makes
copy-paste hard, and user experience is important.

I've mentioned this on the list earlier, but here are the contenders in
my view:

SHA-256:
  Common, but cryptanalysis has advanced.  Preimage resistance (which is
  even more important than collision resistance) has gotten to 52 of 64
  rounds.  Pseudo-collision attacks are possible against 46 of 64
  rounds.  Slowest option.
SHA-3-256:
  Less common, but has a wide security margin.  Cryptanalysis is
  ongoing, but has not advanced much.  Somewhat to much faster than
  SHA-256, unless you have SHA-256 hardware acceleration (which almost
  nobody does).
BLAKE2b-256:
  Lower security margin, but extremely fast (faster than SHA-1 and even
  MD5).

My recommendation has been for SHA-3-256, because I think it provides
the best tradeoff between security and performance.
-- 
brian m. carlson / brian with sandals: Houston, Texas, US
+1 832 623 2791 | https://www.crustytoothpaste.net/~bmc | My opinion only
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature


Re: SHA1 collisions found

2017-02-26 Thread brian m. carlson
On Sun, Feb 26, 2017 at 07:09:44AM +0900, Mike Hommey wrote:
> On Sat, Feb 25, 2017 at 02:26:56PM -0500, Jeff King wrote:
> > I looked at that earlier, because I think it's a reasonable idea for
> > future-proofing. The first byte is a "varint", but I couldn't find where
> > they defined that format.
> > 
> > The closest I could find is:
> > 
> >   https://github.com/multiformats/unsigned-varint
> > 
> > whose README says:
> > 
> >   This unsigned varint (VARiable INTeger) format is for the use in all
> >   the multiformats.
> > 
> > - We have not yet decided on a format yet. When we do, this readme
> >   will be updated.
> > 
> > - We have time. All multiformats are far from requiring this varint.
> > 
> > which is not exactly confidence inspiring. They also put the length at
> > the front of the hash. That's probably convenient if you're parsing an
> > unknown set of hashes, but I'm not sure it's helpful inside Git objects.
> > And there's an incentive to minimize header data at the front of a hash,
> > because every byte is one more byte that every single hash will collide
> > over, and people will have to type when passing hashes to "git show",
> > etc.

The multihash spec also says that it's not necessary to implement
varints until we have 127 hashes, and considering that will be in the
far future, I'm quite happy to punt that problem down the road to
someone else[0].

> > I'd almost rather use something _really_ verbose like
> > 
> >   sha256:1234abcd...
> > 
> > in all of the objects. And then when we get an unadorned hash from the
> > user, we guess it's sha256 (or whatever), and fallback to treating it as
> > a sha1.
> > 
> > Using a syntactically-obvious name like that also solves one other
> > problem: there are sha1 hashes whose first bytes will encode as a "this
> > is sha256" multihash, creating some ambiguity.
> 
> Indeed, multihash only really is interesting when *all* hashes use it.
> And obviously, git can't change the existing sha1s.

Well, that's why I said in new objects.  If we're going to default to a
new hash, we can store it inside the object format, but not actually
expose it to the user.

In other words, if we used SHA-256, a tree object would refer to the SHA-1
empty blob as 1114e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 and the
SHA-256 empty blob as
1220473a0f4c3be8a93681a267e3b1e9a7dcda1185436fe141f7749120a303721813,
but user-visible code would parse them as e69d... and 473a... (or as
sha1:e69d and 473a, or something).

There's very little code which actually parses objects, so it's easy
enough to introduce a few new functions to read and write the prefixed
versions within the objects, and leave the rest to work in the same old
user-visible way (or in the way that you've proposed).

Note also that we need some way to distinguish objects in binary form,
since if we mix hashes, we need to be able to read data directly from
pack files and other locations where we serialize data that way.
Multihash would do that, even if we didn't expose that to the user.

[0] And for the record, I'm a maintenance programmer, and I dislike it
when people punt the problem down the road to someone else, because
that's usually me.
-- 
brian m. carlson / brian with sandals: Houston, Texas, US
+1 832 623 2791 | https://www.crustytoothpaste.net/~bmc | My opinion only
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature


Re: Reference for quote "creating branch is not the issue, merging is", in context of Subversion/Git

2017-02-26 Thread Igor Djordjevic
Hello Michael,

On 26/02/2017 12:40, Michael Hüttermann wrote:
> Linus Torvalds made a statement regarding merging/branching and stated
> (as far as I know) that "creating branch is not the issue, merge is", in
> context of Subversion/Git.
> I do not find the origin source for that. Can you please help and point
> me to a statement or article where Linus elaborated on this?

Could it be that you think of "Tech Talk: Linus Torvalds on Git"[1]
(held on May 3, 2007)?

To give you some clue, here`s an excerpt from Linus' talk/presentation
(taken from the transcript[2] containing the whole thing):

  "... Subversion for example, talks very loudly about how they do CVS
  right by making branching really cheap. It's probably on their main
  webpage where they probably say branching in subversion is O(1)
  operation, you can do as many cheap branches as you want. Nevermind
  that O(1) is actually with pretty large O I think, but even if it
  takes a millionth of a second to do branching, who cares? It's the
  wrong thing you are measuring. Nobody is interested in branching,
  branches are completely useless unless you merge them, and CVS cannot
  merge anything at all. You can merge things once, but because CVS
  then forgets what you did, you can never ever merge anything again
  without getting horrible horrible conflicts. Merging in subversion is
  a complete disaster. The subversion people kind of acknowledge this
  and they have a plan, and their plan sucks too. It is incredible how
  stupid these people are. They've been looking at the wrong problem
  all the time. Branching is not the issue, merging is..."

This specific branch/merge performance talk starts at 50:20[3], where
the part quoted above comes at 51:34[4].

Please note that there`s more context before and after this excerpt
that puts it all into the meant perspective, so you may really want
to watch/listen/read the whole thing anyway.

Regards,
Buga

[1] https://www.youtube.com/watch?v=4XpnKHJAok8
[2] https://git.wiki.kernel.org/index.php/LinusTalk200705Transcript
[3] https://youtu.be/4XpnKHJAok8?t=3020
[4] https://youtu.be/4XpnKHJAok8?t=3094


Re: [PATCH] travis-ci: run scan-build every time

2017-02-26 Thread Lars Schneider

> On 26 Feb 2017, at 03:09, Samuel Lijin  wrote:
> 
> On Sat, Feb 25, 2017 at 3:48 PM, Lars Schneider
>  wrote:
>> 
>>> On 24 Feb 2017, at 18:29, Samuel Lijin  wrote:
>>> 
>>> It's worth noting that there seems to be a weird issue with scan-build
>>> where it *will* generate a report for something locally, but won't do it
>>> on Travis. See [2] for an example where I have a C program with a
>>> very obvious memory leak but scan-build on Travis doesn't generate
>>> a report (despite complaining about it in stdout), even though it does
>>> on my local machine.
>>> 
>>> [1] https://travis-ci.org/sxlijin/git/builds/204853233
>>> [2] https://travis-ci.org/sxlijin/travis-testing/jobs/205025319#L331-L342
>> 
>> Scan-build stores the report in some temp folder. I assume you can't access
>> this folder on TravisCI. Try the scan-build option "-o scan-build-results"
>> to store the report in the local directory.
> 
> That occurred to me, but I don't quite think that's the issue. I just
> noticed that on the repo I use to test build matrices, jobs 1-8 don't
> generate a report, but 9-14 and 19-20 do [1]. I don't think it's an
> issue with write permissions (scan-build complains much more vocally
> if that happens), but it doesn't seem to matter if the output dir is
> in the tmpfs [2] or a local directory [3].
> 
> [1] https://travis-ci.org/sxlijin/travis-testing/builds/205054253
> [2] https://travis-ci.org/sxlijin/git/jobs/205028920#L1000
> [2] https://travis-ci.org/sxlijin/git/jobs/205411705#L998

Scan-build somehow replaces the compiler. My guess is that you 
tell scan-build to substitute clang but "make" is really using 
gcc or something? I reported something strange about the compilers
on TravisCI some time ago but I can't find it anymore. I think I 
remember on OSX they always use clang even if you define gcc. 
Maybe it makes sense to reach out to TravisCI support in case 
this is a bug on their end?

Based on your work I tried the following and it seems to work:
https://travis-ci.org/larsxschneider/git/jobs/205507241
https://github.com/larsxschneider/git/commit/faf4ecfdca1a732459c1f93c334928ee2826d490

- Lars

Re: [PATCH v6 1/1] config: add conditional include

2017-02-26 Thread Philip Oakley

From: "Duy Nguyen" 
On Sat, Feb 25, 2017 at 5:08 AM, Philip Oakley  
wrote:

+Conditional includes
+
+
+You can include one config file from another conditionally by setting



On first reading I thought this implied you can only have one `includeIf`
within the config file.
I think it is meant to mean that each `includeIf`could include one other
file, and that users can have multiple `includeIf` lines.


Yes. Not sure how to put it better though (I basically copied the
first paragraph from the unconditional include section above, which
shares the same confusion). Perhaps just write "the variable can be
specified multiple times"? Or "multiple variables include multiple
times, the last variable does not override the previous ones"?
--


My attempt, based on updating the `Includes` section would be something 
like:


`You can include a config file from another by setting the special 
`include.path` variable to the name of the file to be included. The variable 
takes a pathname as its value, and is subject to tilde expansion. 
`include.path` supports multiple key values.`


The subtle change was to s/one/a/ at the start, and then add the final short 
sentence that states that the section's variables can have multiple key 
values.


I copied the 'multiple key values' phrase from the man page intro for 
consitency, though 'multivalued' could just as easily be used as it is the 
term used by the 'Configuration File' section that this is part of 
https://git-scm.com/docs/git-config#_configuration_file.


Even shorter may be:
`You can include a config file from another by setting the special 
`include.path` variable to the name of the file to be included. The variable 
(can be multivalued) takes a pathname as its value, and is subject to tilde 
expansion.`



The Conditional Includes would follow suit.

Philip







Reference for quote "creating branch is not the issue, merging is", in context of Subversion/Git

2017-02-26 Thread Michael Hüttermann

Hello team,

Linus Torvalds made a statement regarding merging/branching and stated (as far as I know) 
that "creating branch is not the issue, merge is", in context of Subversion/Git.
I do not find the origin source for that. Can you please help and point me to a 
statement or article where Linus elaborated on this?
Thanks for your help.


Kind regards
Michael



Unconventional roles of git

2017-02-26 Thread ankostis
On 26 February 2017 at 02:13, Jason Cooper  wrote:
> As someone looking to deploy (and having previously deployed) git in
> unconventional roles, I'd like to add ...

We are developing a distributed storage for type approval files regarding all
vehicles registered in Europe.[1]  To ensure integrity even after 10 or 30
years, the hash of a commit of these files (as contained in a tag) are
to be printed on the the paper certificates.

- Can you provide some hints for other similar unconventional roles of git?
- Any other comment on the above usage of git are welcomed.

Kind Regards,
  Kostis Anagnostopoulos

[1] https://co2mpas.io


Re: [PATCH 1/5] grep: illustrate bug when recursing with relative pathspec

2017-02-26 Thread Duy Nguyen
On Sat, Feb 25, 2017 at 6:50 AM, Brandon Williams  wrote:
> When using the --recurse-submodules flag with a relative pathspec which
> includes "..", an error is produced inside the child process spawned for
> a submodule.  When creating the pathspec struct in the child, the ".."
> is interpreted to mean "go up a directory" which causes an error stating
> that the path ".." is outside of the repository.
>
> While it is true that ".." is outside the scope of the submodule, it is
> confusing to a user who originally invoked the command where ".." was
> indeed still inside the scope of the superproject.  Since the child
> process luanched for the submodule has some context that it is operating

s/luanched/launched/

I would prefer 1/5 t to be merged with 3/5 though. The problem
description is very light there, and the test demonstration in the
diff is simply switching from failure to success, which forces the
reader to come back here. It's easier to find here now, but it'll be a
bit harder when it enters master and we have to read it from git-log,
I think.

I'm still munching through the super-prefix patches. From how you
changed match_pathspec call in 0281e487fd (grep: optionally recurse
into submodules - 2016-12-16), I guess pathspecs should be handled
with super-prefix instead of the submodule's prefix (which is empty
anyway, I guess). The right solution wrt. handling relative paths may
be teach pathspec about super-prefix (and even original super's cwd)
then let it processes path in supermodule's context.

Does it handle relative paths with wildcards correctly btw? Ones that
cross submodules? I have a feeling it doesn't, but I haven't seen how
exactly super-prefix works yet.

There's another problem with passing pathspec from one process to
another. The issue with preserving the prefix, see 233c3e6c59
(parse_pathspec: preserve prefix length via PATHSPEC_PREFIX_ORIGIN -
2013-07-14). :(icase) needs this because given a path
"/foobar", only the "foobar" part is considered case
insensitive, the prefix part is always case-sensitive. For example, if
you have 4 paths "abc/def", "abc/DEF", "ABC/def" and "ABC/DEF" and are
standing at "abc", you would want ":(icase)def" to match the first two
only, not all of them.

> underneath a superproject, this error could be avoided.
>
> Signed-off-by: Brandon Williams 
> ---
>  t/t7814-grep-recurse-submodules.sh | 42 
> ++
>  1 file changed, 42 insertions(+)
>
> diff --git a/t/t7814-grep-recurse-submodules.sh 
> b/t/t7814-grep-recurse-submodules.sh
> index 67247a01d..418ba68fe 100755
> --- a/t/t7814-grep-recurse-submodules.sh
> +++ b/t/t7814-grep-recurse-submodules.sh
> @@ -227,6 +227,48 @@ test_expect_success 'grep history with moved submoules' '
> test_cmp expect actual
>  '
>
> +test_expect_failure 'grep using relative path' '
> +   test_when_finished "rm -rf parent sub" &&
> +   git init sub &&
> +   echo "foobar" >sub/file &&
> +   git -C sub add file &&
> +   git -C sub commit -m "add file" &&
> +
> +   git init parent &&
> +   echo "foobar" >parent/file &&
> +   git -C parent add file &&
> +   mkdir parent/src &&
> +   echo "foobar" >parent/src/file2 &&
> +   git -C parent add src/file2 &&
> +   git -C parent submodule add ../sub &&
> +   git -C parent commit -m "add files and submodule" &&
> +
> +   # From top works
> +   cat >expect <<-\EOF &&
> +   file:foobar
> +   src/file2:foobar
> +   sub/file:foobar
> +   EOF
> +   git -C parent grep --recurse-submodules -e "foobar" >actual &&
> +   test_cmp expect actual &&
> +
> +   # Relative path to top errors out

After 3/5, it's not "errors out" any more, is it?

> +   cat >expect <<-\EOF &&
> +   ../file:foobar
> +   file2:foobar
> +   ../sub/file:foobar
> +   EOF
> +   git -C parent/src grep --recurse-submodules -e "foobar" -- .. >actual 
> &&
> +   test_cmp expect actual &&
> +
> +   # Relative path to submodule errors out

ditto

> +   cat >expect <<-\EOF &&
> +   ../sub/file:foobar
> +   EOF
> +   git -C parent/src grep --recurse-submodules -e "foobar" -- ../sub 
> >actual &&
> +   test_cmp expect actual
> +'
> +
>  test_incompatible_with_recurse_submodules ()
>  {
> test_expect_success "--recurse-submodules and $1 are incompatible" "
> --
> 2.11.0.483.g087da7b7c-goog
>



-- 
Duy