On 07/31/2013 10:38 AM, Jeffrey Spencer wrote:
Michael,

Pdftocairo is a good tool to know so thanks for that tip.

I still think currently it is a regression with the current 'stamp' method to use it on all accounts. I understand in a complicated figure with a bunch of subplots that this would be beneficial and create smaller code. I don't see how in single figures this would often result in reduced files sizes.

The case where it has an enormous impact is when the same shape is used multiple times. For example in a scatter, hexbin or pcolor plot.

I usually output single figures with one plot and I don't think one of them that I am currently working on was smaller in 1.4.x. They all resulted in reduced file sizes with mpl 1.1.1. This figure of 3d spheres resulted in 60kb instead of roughly 80kb after running pdftocairo. Anyway, you said in coming versions a threshold should be set before stamping of objects occurs so a fix is on the way eventually.

Yes, but it's too complex of a fix to throw in quickly. I think the overall benefit of stamping is preferable to not doing it at all at this point.

Mike


Thanks for all the help,
Jeff


On Wed, Jul 31, 2013 at 11:31 PM, Michael Droettboom <md...@stsci.edu <mailto:md...@stsci.edu>> wrote:

    On 07/30/2013 04:20 PM, Jeffrey Spencer wrote:
    Michael,

    Thanks that is very informative. Answers most of the problems I
    was having and read MEP14 which looks really useful

    That being said does the ps backend subset the fonts or use
    collections for drawing (is the collections feature global or
    just in the pdf backend)?

    The ps backend has the same behavior as pdf on both counts.  TTF
    fonts are subsetted, but the fonts that come from TeX come to use
    as Type1 fonts, which matplotlib currently does not know how to
    subset.  It also handles collections in the same way (by creating
    a "stamp" and reusing it).


    I usually use .eps output and convert to pdf using epstopdf
    unless the figure has an alpha channel because always results in
    a much smaller file (60kB roughly for this file or plain figure
    around 10kB) than direct pdf output with the output looking the
    same. I pretty much always have usetex=True so maybe the pdf file
    is always embedding the full fonts.

    Yes, when usetex=True, matplotlib does not do any font subsetting
    (in any backend).  To get around this limitation, one can use the
    `pdftocairo` tool (part of poppler utils), to convert from pdf to
    a pdf with subsetted fonts.  With your example, I was able to get
    the pdf down to ~80k.  With MEP14, we would basically move such
    functionality into matplotlib itself, but that's sort of a long
    term, semi-back-burner project so it could be a while.

    It's possible that epstopdf is doing some font subsetting of its
    own.  But as you point out, Postscript (as a specification)
    doesn't support alpha, so it's not useful when you need alpha.



    Also, does the Cairo backend support usetex=True or subsetting? I
    know I had read it did not support usetex but that was maybe 2
    years ago or so. The x,y,z axis look correct with cairo but the
    IPA Fonts don't render properly. The legend font says it is size
    12 but if you zoom in extremely close you can see they are the
    correct fonts just way to small. The file size is around 60kB as
    well so I am guessing it supports subsetting of fonts.

    Cairo does support font subsetting, but the matplotlib Cairo
    backend has no support for usetex.  I'm surprised this worked for
    you at all.  When I run your example with the Cairo backend, the
    IPA characters appear as raw TeX source code, i.e. "\textipa{i}",
    which is what I would expect given that the regular font renderer
    doesn't understand that syntax.



    The pgf backend would also subset fonts if output to .pdf I'm
    assuming because that is the default with pdftex? It results in
    similar size files to the .eps output for this file (roughly 60kB
    also).

    Yes.



    The IPA font uses the package (\usepackage{tipa}) and therefore
    that is why I think these look differently. That package draws
    these fonts with its' font libraries instead of whatever is
    selected as the text font. Maybe I'm wrong about this but that is
    my understanding because even in normal latex code the fonts look
    different than the standard text.

    That is correct.  The default font for usetex=True is Computer
    Modern, whereas it is Bitstream Vera Sans in the default font
    rendering.  I was referring to the difference between 1.2 and 1.4
    which was using TeX fonts in both cases, but due to a bug in
    1.3/1.4 was rendering the IPA in serif when you had requested
    sans-serif.

    Mike



    Cheers,
    Jeff


    On Wed, Jul 31, 2013 at 4:43 AM, Michael Droettboom
    <md...@stsci.edu <mailto:md...@stsci.edu>> wrote:

        There are two different things going on here.

        Between 1.2.1 and now, there was a bugfix to the font
        selection routine that inadvertently introduced a bug
        selecting fonts in the usetex backend.  You may notice that
        on master, the IPA font selected is different.  The file size
        difference can be attributed to the slightly larger font size
        of the one it selected vs. the one it should have.  Note that
        when usetex is True, the fonts are not subsetted, so you
        always get the full font embedded in the file (MEP14 work
        will fix this in the future).

        See b5c340 for the bug that introduced the commit, and
        https://github.com/matplotlib/matplotlib/pull/2260 for the
        fix (which should make it into 1.3.0 final).

        Between 1.1.1 and 1.2.1 a change was made in how collections
are handled. Previously, each path was redrawn individually. In 1.2, if a path is reused multiple times, a "stamp" is
        created and then it is "used" multiple times.  In principle,
this generally reduces file sizes by a large amount. However, in the case of this figure with the 3D spheres, each
        path is used only once, so rather than getting the file size
        savings of that approach, we only get the overhead.  The
        backend could be smarter by not doing this when the path is
        only used a small number of times.  Such a fix would be
        welcome, but is probably too large/risky to try to get into
        the current release cycle.  It will have to wait for 1.3.1

        Cheers,
        Mike



        On 07/30/2013 12:24 PM, Jeffrey Spencer wrote:
        K, I have just made the script self-contained but it loads
        external data so I have attached that as well. If you want
        me to just separate out the plotting commands let me know. I
        have also attached my matplotlib rc file which is the same
        on all three systems. All the modifications to the
        matplotlibrc file are copied to the top and in the first 30
        lines or so.

        Of note, the smallest file sizes for pdf are using the pgf
        backend around 60kb. Not sure if that helps at all. It is
        also around the same size if I export to .eps and then
        convert to pdf. About 60kb. The problem with eps in these 3d
        figures though is the back wall I think has an alpha channel
        because just becomes a solid wall in the output. No lines
        through it like the other two walls.


        On Tue, Jul 30, 2013 at 11:23 PM, Jouni K. Seppänen
        <j...@iki.fi <mailto:j...@iki.fi>> wrote:

            Jeffrey Spencer <jeffspenc...@gmail.com
            <mailto:jeffspenc...@gmail.com>> writes:

            > I have three different versions of matplotlib that all
            output different
            > file sizes with matplotlib 1.1.1 providing the
            smallest. This is for the
            > same exact script. I can post the script if that helps.
            >
            > MPL 1.4.x: 539.32kb, Ubuntu 12.10
            > MPL 1.1.1: 172.56kb Ubuntu 12.10
            > MPL 1.2.1: 475.9kb, Ubuntu 13.04

            Yes, it would be interesting to know what the plotting
            commands are.
            Just as a guess, since all the sizes are a few hundred
            kilobytes, it
            could be a difference in e.g. font embedding - many
            TrueType fonts are
            of comparable size.

            --
            Jouni K. Seppänen
            http://www.iki.fi/jks


            
------------------------------------------------------------------------------
            Get your SQL database under version control now!
            Version control is standard for application code, but
            databases havent
            caught up. So what steps can you take to put your SQL
            databases under
            version control? Why should you start doing it? Read
            more to find out.
            
http://pubads.g.doubleclick.net/gampad/clk?id=49501711&iu=/4140/ostg.clktrk
            _______________________________________________
            Matplotlib-users mailing list
            Matplotlib-users@lists.sourceforge.net
            <mailto:Matplotlib-users@lists.sourceforge.net>
            https://lists.sourceforge.net/lists/listinfo/matplotlib-users




        
------------------------------------------------------------------------------
        Get your SQL database under version control now!
        Version control is standard for application code, but databases havent
        caught up. So what steps can you take to put your SQL databases under
        version control? Why should you start doing it? Read more to find out.
        
http://pubads.g.doubleclick.net/gampad/clk?id=49501711&iu=/4140/ostg.clktrk


        _______________________________________________
        Matplotlib-users mailing list
        Matplotlib-users@lists.sourceforge.net  
<mailto:Matplotlib-users@lists.sourceforge.net>
        https://lists.sourceforge.net/lists/listinfo/matplotlib-users


        
------------------------------------------------------------------------------
        Get your SQL database under version control now!
        Version control is standard for application code, but
        databases havent
        caught up. So what steps can you take to put your SQL
        databases under
        version control? Why should you start doing it? Read more to
        find out.
        
http://pubads.g.doubleclick.net/gampad/clk?id=49501711&iu=/4140/ostg.clktrk
        _______________________________________________
        Matplotlib-users mailing list
        Matplotlib-users@lists.sourceforge.net
        <mailto:Matplotlib-users@lists.sourceforge.net>
        https://lists.sourceforge.net/lists/listinfo/matplotlib-users





------------------------------------------------------------------------------
Get your SQL database under version control now!
Version control is standard for application code, but databases havent 
caught up. So what steps can you take to put your SQL databases under 
version control? Why should you start doing it? Read more to find out.
http://pubads.g.doubleclick.net/gampad/clk?id=49501711&iu=/4140/ostg.clktrk
_______________________________________________
Matplotlib-users mailing list
Matplotlib-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/matplotlib-users

Reply via email to