Re: LiveCD optimisations

2010-05-21 Thread Louis Simard
At 2010-05-21 04:41 GMT, Martin Owens docto...@gmail.com wrote:
 Hey Louis,

Hey Martin, thanks for the reply!

 Sounds great and looks like a pretty good script, I have some comments:

 You may be able to make it a little faster by using the find results in
 one like like this:

 find / -type f -name *.svg -print0 | xargs -0 -I FILE sh -c
 '/tmp/scour/scour.py --enable-id-stripping --indent=none -i FILE -o
 FILE-opt  test -s FILE-opt  mv FILE-opt FILE || rm FILE-opt'

I had considered using sh -c to execute the Scouring and renaming,
yes, but didn't know how to go about detecting empty files except with
another 'find'. Thanks for telling me about test -s :)

 Although if you can get all that into a script file, so much the better
 so it's not all on one line. But at least it's not doing a find 3 times
 for the same files.

True. This is a case of optimising the optimiser, which I consider a
micro-optimisation because the later invocations of 'find' are highly
likely to have the needed disk blocks in RAM - but every little bit
helps, just like with these image files. (Speaking of which, Scour.py
imports the Psyco JIT if it's available, but it doesn't help that
much. It makes the Python code itself run faster, yes, but at the cost
of greater startup time for each Scour.py instance, and most files are
optimised in 0.06 second anyway.)

 Do you need to chroot into the file system to perform these steps?
 considering that your downloading code to do it (with bzr which isn't
 installed ont he cd). Would it not be good to perform these steps
 outside of the squashfs and iso file system?

 For instance I got resolve issues when it tried to do the apt update.

I probably don't. That was part of a script that allowed me to
customise more things, such as updating packages (which I needed to
chroot for), removing the desktop background, updating Linux and all
that; I just trimmed it down for this email. I'll move the chroot
processing to the host.

 Are there no more things that could be optimised? For instance does
 using xmllint with --noblanks on the 12496 xml files save any space?

Will test this shortly. I hadn't thought of that yet, and I'm
flabbergasted by the number of XML files! Seeing as SVG files are also
XML files, and Scour.py seems to pretty-print XML even with
--indent=none, that might save even more, actually.

 Finally... should some of these optimisations work their way upstream so
 all packages have optimised files, smaller downloads, smarter mirror
 storage etc?

Of course! :) Working with upstreams would avoid keeping debdiffs
around for the optimised files in Ubuntu repositories, and will help
other distributions too.

I'll attach a modified script to my next email with more testing
results regarding XML.

Regards,
- Louis

-- 
Ubuntu-devel-discuss mailing list
Ubuntu-devel-discuss@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel-discuss


Re: LiveCD optimisations

2010-05-21 Thread Phillip Susi
On 5/20/2010 8:35 PM, Louis Simard wrote:
 Greetings ubuntu-devel-discuss :)
 
 I have a proposal for you, and I'll present it simply with the 5 W's.

snip

When attaching scripts please make sure they are attached with an inline
disposition so they are readily reviewable while reading the email
instead of having to save them and open them in another text editor.

Also could you explain a bit what you mean by optimizations?  You can
of course, use a higher lossy compression on the png images, but that
lowers their quality, which I think is not a desirable tradeoff.

-- 
Ubuntu-devel-discuss mailing list
Ubuntu-devel-discuss@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel-discuss


LiveCD optimisations explained

2010-05-21 Thread Louis Simard
At 2001-05-21 14:48 GMT, Phillip Susi ps...@cfl.rr.com wrote:
 When attaching scripts please make sure they are attached with an inline
 disposition so they are readily reviewable while reading the email
 instead of having to save them and open them in another text editor.

Err... While I know what you want me to do (you want
Content-Disposition: inline), I don't know how to do that in the Gmail
web interface. Perhaps I'll set up Mozilla Thunderbird, if it can do
that :-)

 [C]ould you explain a bit what you mean by optimizations?  You can
 of course, use a higher lossy compression on the png images, but that
 lowers their quality, which I think is not a desirable tradeoff.

The optimisations I describe would be completely lossless, barring
bugs in the software used to carry out these optimisations.

- For PNG: the data used to store some images on the CD is not
compressed to the highest level. OptiPNG takes those files and tries
to recompress them to the highest level, while ensuring that every
pixel's color value ends up being the same.

- For SVG: the data used to store ALL images on the CD is not optimal
for rendering purposes. Inkscape metadata, Sodipodi metadata, ID names
for elements that end up unused, gradients defined dozens of times,
etc., are bloating the files. Scour.py takes those files and removes
this bloat, while ensuring that the new versions render identically to
the original. However, since Inkscape's metadata ends up removed, it
could be more difficult for users to open these new files in Inkscape.

- For XML, as described by Martin Owens: xmllint would remove
everything superfluous from all files on the CD, while ensuring that
the data is parsed identically. I haven't tested this yet except on
one file from the CD (squashfs -
/var/lib/gconf/defaults/%gconf-tree.xml), but that file went from
2,095,034 bytes to 1,779,376 (a savings of 315,658). There's more hope
yet.

Regards,
- Louis

-- 
Ubuntu-devel-discuss mailing list
Ubuntu-devel-discuss@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel-discuss


Re: LiveCD optimisations explained

2010-05-21 Thread Phillip Susi
On 5/21/2010 1:40 PM, Louis Simard wrote:
 Err... While I know what you want me to do (you want
 Content-Disposition: inline), I don't know how to do that in the Gmail
 web interface. Perhaps I'll set up Mozilla Thunderbird, if it can do
 that :-)

Heh, yea, I've struggled with this on thunderbird too, which is why I
usually end up submitting patches via something like mime-construct or
some other command line mime editor where I can force it to use
Content-Disposition: inline.

 - For PNG: the data used to store some images on the CD is not
 compressed to the highest level. OptiPNG takes those files and tries
 to recompress them to the highest level, while ensuring that every
 pixel's color value ends up being the same.

I believe that PNG applies a lossey compression first, then gzips the
result.  It sounds like you are saying that the gzip is done with -3
instead of -9, so you ungzip it and recompress on -9.  Is that more or
less correct?  If so that sounds pretty good, but like you mentioned
before, should be done upstream rather than only for the livecd.

 - For SVG: the data used to store ALL images on the CD is not optimal
 for rendering purposes. Inkscape metadata, Sodipodi metadata, ID names
 for elements that end up unused, gradients defined dozens of times,
 etc., are bloating the files. Scour.py takes those files and removes
 this bloat, while ensuring that the new versions render identically to
 the original. However, since Inkscape's metadata ends up removed, it
 could be more difficult for users to open these new files in Inkscape.

Sounds good, and also would be good to do upstream instead of just for
the lived.

 - For XML, as described by Martin Owens: xmllint would remove
 everything superfluous from all files on the CD, while ensuring that
 the data is parsed identically. I haven't tested this yet except on
 one file from the CD (squashfs -
 /var/lib/gconf/defaults/%gconf-tree.xml), but that file went from
 2,095,034 bytes to 1,779,376 (a savings of 315,658). There's more hope
 yet.

I noticed the bloated gconf xml files a few years back myself and
brought it up on the devel list.  IIRC I saw even more wasted space than
you mention here, due to 10, 20, even 30 characters of whitespace
indenting each line.  This does add a lot of bloat to the files I don't
like to have on an installed system, but once compressed into the
squashfs for the livecd, the whitespace drops out, so there wasn't much
concern about it.  At one point I tried just converting the whitespace
into hard tabs and saved quite a bit of space, while preserving the
indentation for human editing.

-- 
Ubuntu-devel-discuss mailing list
Ubuntu-devel-discuss@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel-discuss


Re: LiveCD optimisations

2010-05-21 Thread Dmitrijs Ledkovs
On 21 May 2010 01:35, Louis Simard louis.sim...@gmail.com wrote:

 -- WHAT? --

 Optimise the PNG images and SVG files on the Ubuntu LiveCD.
 Optimise the Ubuntu LiveCD by putting start-up files and programs near
 the end of the CD.


-- Implementation --

1) Should this go into deb-package mangler run by soyuz?

2) Or should this be implemented as debhelper addon / cdbs as no-op
ubuntu-patch and then if successful (all the quirks are worked out)
and pushed to Debian?

-- 
Ubuntu-devel-discuss mailing list
Ubuntu-devel-discuss@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel-discuss