Re: [PATCH] Reduce COW sections data by marking data constant

2008-02-01 Thread Hrvoje Niksic
Diego 'Flameeyes' Pettenò [EMAIL PROTECTED] writes:

 It is a micro-optimisation, I admit that, but it's not just the
 indirection the problem.

 Pointers, and structures containing pointers, need to be
 runtime-relocated for shared libraries and PIC code (let's assume
 that shared libraries are always PIC, for the sake of argument).

Even ignoring the fact that Wget is not a shared library, there are
ways to solve this problem other than turning all char *foo[] into
char foo[][MAXSIZE], which is, sorry, just lame and wasteful in all
but the most trivial examples.  The method described by Mart is one.
For readers of this list, it boils down to turning:

static const char *foo[] = {one, two, three };

into:

static const char foo_data[] = one\0two\0three;
static const foo_ind[] = {0, 4, 8};

static const char *foo(int ind) {
  return foo_data[foo_data + foo_ind[ind]];
}

(This technique was made popular by Ulrich Drepper.)

Maintaining the array of indices manually is cumbersome, but you could
write a tool that created the above three things from the original
const char *foo[] array without any user intervention.  But again,
that is total overkill for a non-PIC like Wget, and most likely for
smaller PICs as well.


Re: [PATCH] Reduce COW sections data by marking data constant

2008-02-01 Thread Hrvoje Niksic
Diego 'Flameeyes' Pettenò [EMAIL PROTECTED] writes:

 On 01/feb/08, at 09:12, Hrvoje Niksic wrote:

 Even ignoring the fact that Wget is not a shared library, there are
 ways to solve this problem other than turning all char *foo[] into
 char foo[][MAXSIZE], which is, sorry, just lame and wasteful in all
 but the most trivial examples.

 That's why I didn't turn _all_ of them, but just where the waste of
 space for the strings was very limited, or none at all.

I appreciate that, but from a maintainer's perspective that kind of
change adds a small maintenance burden for literally *no* gain.
Although the individual burden in each case is quite small, they tend
to accumulate.  On the other hand, the gain by this kind of change is
virtually zero and doesn't accumulate into a measurable performance
boost.  This kind of effort should be redirected toward shared
libraries where it might actually make a difference, especially for
those that are used by many programs and that contain many pointers.

Of course, the changes that introduce const without compromising
maintainability, such as the constification of the Wp declaration in
ftp-opie.c, are more than welcome.


Re: [PATCH] Reduce COW sections data by marking data constant

2008-02-01 Thread Diego 'Flameeyes' Pettenò


On 01/feb/08, at 09:12, Hrvoje Niksic wrote:


Even ignoring the fact that Wget is not a shared library, there are
ways to solve this problem other than turning all char *foo[] into
char foo[][MAXSIZE], which is, sorry, just lame and wasteful in all
but the most trivial examples.


That's why I didn't turn _all_ of them, but just where the waste of  
space for the strings was very limited, or none at all.




Re: [PATCH] Reduce COW sections data by marking data constant

2008-02-01 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hrvoje Niksic wrote:
 Of course, the changes that introduce const without compromising
 maintainability, such as the constification of the Wp declaration in
 ftp-opie.c, are more than welcome.

Indeed. In fact, even if Diego doesn't submit a patch with just those
changes, I'll probably put just those ones in myself.

 bThe method described by Mart is one.
 For readers of this list, it boils down to turning:
 
 static const char *foo[] = {one, two, three };
 
 into:
 
 static const char foo_data[] = one\0two\0three;
 static const foo_ind[] = {0, 4, 8};
 
 static const char *foo(int ind) {
   return foo_data[foo_data + foo_ind[ind]];
 }

Note that you could also do all the pointer maths up-front, leaving
existing usage code the same, with something like:

  static const char foo_data[] = one\0two\0three;
  static const char * const foo = {foo_data + 0, foo_data + 4,
foo_data + 8};

But yeah, I don't really think it's worthwhile for us.

In fact, even if we do end up making a Wget-like library at some point,
to wrap around something like libcurl, I don't think I'd spend too much
time trying to optimize this sort of thing. Though using properly
constified data will still be important. :)

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer...
http://micah.cowan.name/
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHo06G7M8hyUobTrERAof9AJ0alipt3l7IbrB3LIjgSsGBki0aagCfXlQ+
GmXOZgcVq9fz6UMyJT+xRfo=
=rAVe
-END PGP SIGNATURE-


Re: [PATCH] Reduce COW sections data by marking data constant

2008-02-01 Thread Hrvoje Niksic
Micah Cowan [EMAIL PROTECTED] writes:

 Note that you could also do all the pointer maths up-front, leaving
 existing usage code the same, with something like:

   static const char foo_data[] = one\0two\0three;
   static const char * const foo = {foo_data + 0, foo_data + 4,
 foo_data + 8};

I believe that doesn't help because foo[] remains an array of
pointers, each of which needs to be reloacated.


Re: [PATCH] Reduce COW sections data by marking data constant

2008-02-01 Thread Hrvoje Niksic
Micah Cowan [EMAIL PROTECTED] writes:

 Right. What I was meaning to prevent, though, is the need to do:

   foo[foo_data + foo_idx[i]]

 and instead do:

   foo[i]

That is why my example had a foo function, which turns foo[i] to
foo(i), but otherwise works the same.  Using just foo[i] is
unfortunately not possible because it requires either keeping all the
relocations (in which case the foo_data+foo_idx excercise doesn't make
sense) or resorting to the 2D array-of-char trick.


Re: [PATCH] Reduce COW sections data by marking data constant

2008-01-31 Thread Micah Cowan

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

(Forgot to CC the list.

Also, my mailer seems to be screwing up the quotes. :(  -M)

Diego 'Flameeyes' Pettenò wrote:
| As for maintainability, while a character pointer is easier to handle
than a
| character array, modern compilers warns you if you tries to force
bigger data
| into an array:
|
| struct { char foo[2]; } bar[] { { foobar } };

Yes; but I still have to manually increment the size to an appropriate
value each time I add a bigger string; plus I'm wasting space for all
shorter strings (which applies more to bigger-than-two-element char arrays);

| There actually are quite a few differences, I've documented them on my
blog
| some time ago:
|
|
http://farragut.flameeyes.is-a-geek.org/articles/2007/12/19/array-of-pointers-and-array-of-arrays
|
http://farragut.flameeyes.is-a-geek.org/articles/2008/01/01/some-more-about-arrays-of-strings

As I understand it, you have two issues with things like:

~  static const char *additional_attributes[] = { ... }

The first is that, since additional_attributes[] itself isn't constant,
it will not reside in read-only memory, unlike the data its elements
point at. This is easily solved with:

~  static const char * const additional_attributes[] = { ... }

Which is better anyway, since it also prevents someone from accidentally
overwriting elements with something like:

~  if (additional_attributes[4] = magic_attr) {
~/* ^ Whoops! */

The other is that there is the inefficiency of indirection, where the
code has to find the proper string by first finding the proper element
within the additional_attributes array, and then following _that_
pointer to the proper string.

The first solution makes good sense to me, particularly for its improved
correctness; but fixing the second problem seems like a
nano-optimization that doesn't bring sufficient efficiency benefits to
outweigh its drawbacks in maintainability.

Anyway, thanks for the information, and helpful suggestions. :)

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer...
http://micah.cowan.name/
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHom/y7M8hyUobTrERAkzEAJ9tV3ZbzLuqlfvZW2SLDa4YkZNJQgCeN4f5
905Ny2JMxt1hsWsgVcIGZtE=
=yvIE
-END PGP SIGNATURE-


Re: [PATCH] Reduce COW sections data by marking data constant

2008-01-31 Thread Diego 'Flameeyes' Pettenò
On Friday 01 February 2008, you wrote:
 Yes; but I still have to manually increment the size to an appropriate
 value each time I add a bigger string; plus I'm wasting space for all
 shorter strings (which applies more to bigger-than-two-element char
 arrays);

Actually, you're not always. The actual space occupied by a pointer to 
character is

sizeof(char*) + sizeof(literal);

This means that for 32-bit (best-case)

char *foo = a - 4 + 2 = 6 bytes
char *foo = ab - 4 + 3 = 9 bytes
char *foo = abc - 4 + 4 = 8 bytes

char foo[] = a - 2 bytes
char foo[] = ab - 3 bytes
char foo[] = abc - 4 bytes

By that reasoning, char foo[6] will _always_ take less or equal space than 
char *foo.
But 32-bit architectures are also dying beside embedded systems, which means 
that the most common case especially in the years to come will be 
sizeof(char*) == 8 (64-bit architectures), at which point char foo[10] will 
always be smaller or equal to char *foo.

Considering you get out with an indirection less, most of the time even 
wasting 8 more bytes per string is worth the change.

 The first solution makes good sense to me, particularly for its improved
 correctness; but fixing the second problem seems like a
 nano-optimization that doesn't bring sufficient efficiency benefits to
 outweigh its drawbacks in maintainability.

It is a micro-optimisation, I admit that, but it's not just the indirection 
the problem.

Pointers, and structures containing pointers, need to be runtime-relocated for 
shared libraries and PIC code (let's assume that shared libraries are always 
PIC, for the sake of argument). In these cases the data is written 
to .data.rel, or .data.rel.ro for costants (I'm talking ELF here, other 
output formats most likely have something like that), which are copy-on-write 
sections. Without prelinking, these sections will be copied right at the 
start of the program, and when copied, they are no more shared between 
processes. Which means they'll be using pages of private memory on the 
various processes.

It is a minor thing for wget as it's not a shared library, but a 
fire-and-forget program, so even for security people using PIE it's far from 
being a big hit. It's more important for shared libraries though, especially 
because shared libraries can't always be prelinked properly. See also 
http://farragut.flameeyes.is-a-geek.org/articles/2008/01/01/reminding-a-weakness-of-prelink

Now, glib comes to the point of using the method that Mart described in the 
comments to my first blog ( 
http://farragut.flameeyes.is-a-geek.org/articles/2007/12/19/array-of-pointers-and-array-of-arrays
 ) 
and I quoted on the second ( 
http://farragut.flameeyes.is-a-geek.org/articles/2008/01/01/some-more-about-arrays-of-strings
 ) 
to avoid the increase-the-size and waste-the-space game, but making it a 
quite bigger, IMHO, maintenance problem. I haven't seen enough use yet for 
this to be a royally good idea, but I'm thinking of trying that out with 
xine-lib soon enough (without the second array, if the access can be done 
sequentially instead of randomly).

-- 
Diego Flameeyes Pettenò
http://farragut.flameeyes.is-a-geek.org/



signature.asc
Description: This is a digitally signed message part.


Re: [PATCH] Reduce COW sections data by marking data constant

2008-01-31 Thread Josh Williams
On Jan 31, 2008 8:21 PM, Diego 'Flameeyes' Pettenò [EMAIL PROTECTED] wrote:
 char *foo = ab - 4 + 3 = 9 bytes

How did you get 9?


Re: [PATCH] Reduce COW sections data by marking data constant

2008-01-31 Thread Diego 'Flameeyes' Pettenò


On 01/feb/08, at 02:33, Josh Williams wrote:


How did you get 9?


Probably by being 2:21 AM here ;) Sorry just a thinko.