Re: [PATCH] Reduce COW sections data by marking data constant
Diego 'Flameeyes' Pettenò [EMAIL PROTECTED] writes: It is a micro-optimisation, I admit that, but it's not just the indirection the problem. Pointers, and structures containing pointers, need to be runtime-relocated for shared libraries and PIC code (let's assume that shared libraries are always PIC, for the sake of argument). Even ignoring the fact that Wget is not a shared library, there are ways to solve this problem other than turning all char *foo[] into char foo[][MAXSIZE], which is, sorry, just lame and wasteful in all but the most trivial examples. The method described by Mart is one. For readers of this list, it boils down to turning: static const char *foo[] = {one, two, three }; into: static const char foo_data[] = one\0two\0three; static const foo_ind[] = {0, 4, 8}; static const char *foo(int ind) { return foo_data[foo_data + foo_ind[ind]]; } (This technique was made popular by Ulrich Drepper.) Maintaining the array of indices manually is cumbersome, but you could write a tool that created the above three things from the original const char *foo[] array without any user intervention. But again, that is total overkill for a non-PIC like Wget, and most likely for smaller PICs as well.
Re: [PATCH] Reduce COW sections data by marking data constant
Diego 'Flameeyes' Pettenò [EMAIL PROTECTED] writes: On 01/feb/08, at 09:12, Hrvoje Niksic wrote: Even ignoring the fact that Wget is not a shared library, there are ways to solve this problem other than turning all char *foo[] into char foo[][MAXSIZE], which is, sorry, just lame and wasteful in all but the most trivial examples. That's why I didn't turn _all_ of them, but just where the waste of space for the strings was very limited, or none at all. I appreciate that, but from a maintainer's perspective that kind of change adds a small maintenance burden for literally *no* gain. Although the individual burden in each case is quite small, they tend to accumulate. On the other hand, the gain by this kind of change is virtually zero and doesn't accumulate into a measurable performance boost. This kind of effort should be redirected toward shared libraries where it might actually make a difference, especially for those that are used by many programs and that contain many pointers. Of course, the changes that introduce const without compromising maintainability, such as the constification of the Wp declaration in ftp-opie.c, are more than welcome.
Re: [PATCH] Reduce COW sections data by marking data constant
On 01/feb/08, at 09:12, Hrvoje Niksic wrote: Even ignoring the fact that Wget is not a shared library, there are ways to solve this problem other than turning all char *foo[] into char foo[][MAXSIZE], which is, sorry, just lame and wasteful in all but the most trivial examples. That's why I didn't turn _all_ of them, but just where the waste of space for the strings was very limited, or none at all.
Re: [PATCH] Reduce COW sections data by marking data constant
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hrvoje Niksic wrote: Of course, the changes that introduce const without compromising maintainability, such as the constification of the Wp declaration in ftp-opie.c, are more than welcome. Indeed. In fact, even if Diego doesn't submit a patch with just those changes, I'll probably put just those ones in myself. bThe method described by Mart is one. For readers of this list, it boils down to turning: static const char *foo[] = {one, two, three }; into: static const char foo_data[] = one\0two\0three; static const foo_ind[] = {0, 4, 8}; static const char *foo(int ind) { return foo_data[foo_data + foo_ind[ind]]; } Note that you could also do all the pointer maths up-front, leaving existing usage code the same, with something like: static const char foo_data[] = one\0two\0three; static const char * const foo = {foo_data + 0, foo_data + 4, foo_data + 8}; But yeah, I don't really think it's worthwhile for us. In fact, even if we do end up making a Wget-like library at some point, to wrap around something like libcurl, I don't think I'd spend too much time trying to optimize this sort of thing. Though using properly constified data will still be important. :) - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer... http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFHo06G7M8hyUobTrERAof9AJ0alipt3l7IbrB3LIjgSsGBki0aagCfXlQ+ GmXOZgcVq9fz6UMyJT+xRfo= =rAVe -END PGP SIGNATURE-
Re: [PATCH] Reduce COW sections data by marking data constant
Micah Cowan [EMAIL PROTECTED] writes: Note that you could also do all the pointer maths up-front, leaving existing usage code the same, with something like: static const char foo_data[] = one\0two\0three; static const char * const foo = {foo_data + 0, foo_data + 4, foo_data + 8}; I believe that doesn't help because foo[] remains an array of pointers, each of which needs to be reloacated.
Re: [PATCH] Reduce COW sections data by marking data constant
Micah Cowan [EMAIL PROTECTED] writes: Right. What I was meaning to prevent, though, is the need to do: foo[foo_data + foo_idx[i]] and instead do: foo[i] That is why my example had a foo function, which turns foo[i] to foo(i), but otherwise works the same. Using just foo[i] is unfortunately not possible because it requires either keeping all the relocations (in which case the foo_data+foo_idx excercise doesn't make sense) or resorting to the 2D array-of-char trick.
Re: [PATCH] Reduce COW sections data by marking data constant
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 (Forgot to CC the list. Also, my mailer seems to be screwing up the quotes. :( -M) Diego 'Flameeyes' Pettenò wrote: | As for maintainability, while a character pointer is easier to handle than a | character array, modern compilers warns you if you tries to force bigger data | into an array: | | struct { char foo[2]; } bar[] { { foobar } }; Yes; but I still have to manually increment the size to an appropriate value each time I add a bigger string; plus I'm wasting space for all shorter strings (which applies more to bigger-than-two-element char arrays); | There actually are quite a few differences, I've documented them on my blog | some time ago: | | http://farragut.flameeyes.is-a-geek.org/articles/2007/12/19/array-of-pointers-and-array-of-arrays | http://farragut.flameeyes.is-a-geek.org/articles/2008/01/01/some-more-about-arrays-of-strings As I understand it, you have two issues with things like: ~ static const char *additional_attributes[] = { ... } The first is that, since additional_attributes[] itself isn't constant, it will not reside in read-only memory, unlike the data its elements point at. This is easily solved with: ~ static const char * const additional_attributes[] = { ... } Which is better anyway, since it also prevents someone from accidentally overwriting elements with something like: ~ if (additional_attributes[4] = magic_attr) { ~/* ^ Whoops! */ The other is that there is the inefficiency of indirection, where the code has to find the proper string by first finding the proper element within the additional_attributes array, and then following _that_ pointer to the proper string. The first solution makes good sense to me, particularly for its improved correctness; but fixing the second problem seems like a nano-optimization that doesn't bring sufficient efficiency benefits to outweigh its drawbacks in maintainability. Anyway, thanks for the information, and helpful suggestions. :) - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer... http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFHom/y7M8hyUobTrERAkzEAJ9tV3ZbzLuqlfvZW2SLDa4YkZNJQgCeN4f5 905Ny2JMxt1hsWsgVcIGZtE= =yvIE -END PGP SIGNATURE-
Re: [PATCH] Reduce COW sections data by marking data constant
On Friday 01 February 2008, you wrote: Yes; but I still have to manually increment the size to an appropriate value each time I add a bigger string; plus I'm wasting space for all shorter strings (which applies more to bigger-than-two-element char arrays); Actually, you're not always. The actual space occupied by a pointer to character is sizeof(char*) + sizeof(literal); This means that for 32-bit (best-case) char *foo = a - 4 + 2 = 6 bytes char *foo = ab - 4 + 3 = 9 bytes char *foo = abc - 4 + 4 = 8 bytes char foo[] = a - 2 bytes char foo[] = ab - 3 bytes char foo[] = abc - 4 bytes By that reasoning, char foo[6] will _always_ take less or equal space than char *foo. But 32-bit architectures are also dying beside embedded systems, which means that the most common case especially in the years to come will be sizeof(char*) == 8 (64-bit architectures), at which point char foo[10] will always be smaller or equal to char *foo. Considering you get out with an indirection less, most of the time even wasting 8 more bytes per string is worth the change. The first solution makes good sense to me, particularly for its improved correctness; but fixing the second problem seems like a nano-optimization that doesn't bring sufficient efficiency benefits to outweigh its drawbacks in maintainability. It is a micro-optimisation, I admit that, but it's not just the indirection the problem. Pointers, and structures containing pointers, need to be runtime-relocated for shared libraries and PIC code (let's assume that shared libraries are always PIC, for the sake of argument). In these cases the data is written to .data.rel, or .data.rel.ro for costants (I'm talking ELF here, other output formats most likely have something like that), which are copy-on-write sections. Without prelinking, these sections will be copied right at the start of the program, and when copied, they are no more shared between processes. Which means they'll be using pages of private memory on the various processes. It is a minor thing for wget as it's not a shared library, but a fire-and-forget program, so even for security people using PIE it's far from being a big hit. It's more important for shared libraries though, especially because shared libraries can't always be prelinked properly. See also http://farragut.flameeyes.is-a-geek.org/articles/2008/01/01/reminding-a-weakness-of-prelink Now, glib comes to the point of using the method that Mart described in the comments to my first blog ( http://farragut.flameeyes.is-a-geek.org/articles/2007/12/19/array-of-pointers-and-array-of-arrays ) and I quoted on the second ( http://farragut.flameeyes.is-a-geek.org/articles/2008/01/01/some-more-about-arrays-of-strings ) to avoid the increase-the-size and waste-the-space game, but making it a quite bigger, IMHO, maintenance problem. I haven't seen enough use yet for this to be a royally good idea, but I'm thinking of trying that out with xine-lib soon enough (without the second array, if the access can be done sequentially instead of randomly). -- Diego Flameeyes Pettenò http://farragut.flameeyes.is-a-geek.org/ signature.asc Description: This is a digitally signed message part.
Re: [PATCH] Reduce COW sections data by marking data constant
On Jan 31, 2008 8:21 PM, Diego 'Flameeyes' Pettenò [EMAIL PROTECTED] wrote: char *foo = ab - 4 + 3 = 9 bytes How did you get 9?
Re: [PATCH] Reduce COW sections data by marking data constant
On 01/feb/08, at 02:33, Josh Williams wrote: How did you get 9? Probably by being 2:21 AM here ;) Sorry just a thinko.