Bug#783210: [PATCH] nscd_stat.c: make the build reproducible

2016-11-04 Thread Ximin Luo
Ximin Luo:
> Mike Frysinger:
>> On 28 Jul 2016 15:15, Florian Weimer wrote:
>>> On 03/09/2016 05:30 PM, Mike Frysinger wrote:
 would it be so terrible to properly marshall this data ?
>>>
>>> Ximin Luo and I discussed this and I wonder if it is possible to read 
>>> out the libc.so.6 build ID if it is present.  It should indirectly call 
>>> all the layout dependencies and be reasonably easy to access because it 
>>> is in an allocated section (and we might want to print it from an 
>>> libc.so.6 invocation, too).
>>>
>>> We still need the time-based approach if the build ID is not available, 
>>> but I expect most distributions will have something like it.
>>>
>>> The Debian bug is:
>>>
>>>https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=783210
>>>
>>> (Also Cc:ed)
>>
>> agreed that build-id should be an acceptable replacement for what the
>> code is doing today, but in order to pull that off, i guess you'd have
>> to have to do a configure test to see if build-id is active ?  if you
>> leave the logic to runtime, you'd still need to include the datetime
>> stamp in the object which would still make the build unreproducible.
>>
>> this also doesn't really cover the quoted idea of marshalling the data
>> between client & server :).
>> -mike
>>
> 
> Hi all,
> 
> I've written a small program that prints out the Build IDs of all the objects 
> that are dynamically linked to it, plus itself.
> 
> It works well, although I'm not a C expert so I don't know if it is portable 
> enough. For example, I hard-code some >>2 <<2s in there, along with a uint8_t 
> - I didn't see a corresponding ElfW(xxx) type in elf.h
> 
> Another downside is it needs to be linked against libdl, which I think is not 
> the case currently with nscd. I'm not sure if this carries extra security 
> risk or whatever.
> 

Oh! Actually it doesn't need to be linked against libdl. That was from an 
earlier version of the code where I was using dlinfo instead of 
dl_iterate_phdr. But this latter function doesn't need extra libs. :)

> An alternative would be to detect the build-id *at build time* and then 
> monkey-patch it into the binary itself.
> 
> What do you all think? How shall I proceed?
> 

X

-- 
GPG: ed25519/56034877E1F87C35
GPG: rsa4096/1318EFAC5FBBDBCE
https://github.com/infinity0/pubkeys.git



Bug#783210: [PATCH] nscd_stat.c: make the build reproducible

2016-11-04 Thread Ximin Luo
Mike Frysinger:
> On 28 Jul 2016 15:15, Florian Weimer wrote:
>> On 03/09/2016 05:30 PM, Mike Frysinger wrote:
>>> would it be so terrible to properly marshall this data ?
>>
>> Ximin Luo and I discussed this and I wonder if it is possible to read 
>> out the libc.so.6 build ID if it is present.  It should indirectly call 
>> all the layout dependencies and be reasonably easy to access because it 
>> is in an allocated section (and we might want to print it from an 
>> libc.so.6 invocation, too).
>>
>> We still need the time-based approach if the build ID is not available, 
>> but I expect most distributions will have something like it.
>>
>> The Debian bug is:
>>
>>https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=783210
>>
>> (Also Cc:ed)
> 
> agreed that build-id should be an acceptable replacement for what the
> code is doing today, but in order to pull that off, i guess you'd have
> to have to do a configure test to see if build-id is active ?  if you
> leave the logic to runtime, you'd still need to include the datetime
> stamp in the object which would still make the build unreproducible.
> 
> this also doesn't really cover the quoted idea of marshalling the data
> between client & server :).
> -mike
> 

Hi all,

I've written a small program that prints out the Build IDs of all the objects 
that are dynamically linked to it, plus itself.

It works well, although I'm not a C expert so I don't know if it is portable 
enough. For example, I hard-code some >>2 <<2s in there, along with a uint8_t - 
I didn't see a corresponding ElfW(xxx) type in elf.h

Another downside is it needs to be linked against libdl, which I think is not 
the case currently with nscd. I'm not sure if this carries extra security risk 
or whatever.

An alternative would be to detect the build-id *at build time* and then 
monkey-patch it into the binary itself.

What do you all think? How shall I proceed?

X

-- 
GPG: ed25519/56034877E1F87C35
GPG: rsa4096/1318EFAC5FBBDBCE
https://github.com/infinity0/pubkeys.git
#define _GNU_SOURCE
#include 
#include 

int callback (struct dl_phdr_info *info, size_t size, void *data) {
  printf ("\nname: %s\n", info->dlpi_name);

  ElfW(Phdr) *phdr = (ElfW(Phdr) *) info->dlpi_phdr;
  for (ElfW(Half) i = 0; i < info->dlpi_phnum; i++) {
if (phdr->p_type == PT_NOTE) {
  ElfW(Addr) addr = info->dlpi_addr + info->dlpi_phdr[i].p_vaddr;
  ElfW(Addr) nend = addr + info->dlpi_phdr[i].p_memsz;
  //printf ("found NOTE segment at: %p to %p\n", addr, nend);

  while (addr < nend) {
	ElfW(Nhdr) *nhdr = (ElfW(Nhdr) *) addr;
	// According to the ELF spec, namesz and descsz do not include padding
	// but that's how they're laid out in memory; add the padding here.
	ElfW(Addr) nameoff = (((nhdr->n_namesz-1)>>2)+1)<<2;
	ElfW(Addr) descoff = (((nhdr->n_descsz-1)>>2)+1)<<2;

	if (nhdr->n_type == NT_GNU_BUILD_ID) {
	  const uint8_t *buf = (const uint8_t *) ((ElfW(Addr))(nhdr + 1) + nameoff);
	  printf("Build ID");
	  for (int j = 0; j < nhdr->n_descsz; j++)
	printf(":%02X", buf[j]);
	  printf("\n");
	}

	//printf("skipping section type %02X\n", nhdr->n_type);
	addr = (ElfW(Addr))(nhdr + 1) + nameoff + descoff;
  }
}

phdr += 1;
  }

  return 0;
}

int main() {
  dl_iterate_phdr(callback, NULL);
}