Re: [matplotlib-devel] MEP14: Improve text handling
Michael Droettboom md...@stsci.edu writes: I've drafted a MEP with a plan to improve some of the text and font handling in matplotlib. I'd love any and all feedback. https://github.com/matplotlib/matplotlib/wiki/Mep14 I'm a bit late to the party, but here are a few thoughts: What I see as the biggest problem in the current font-selection system is its opaqueness. You can attempt to specify a style you'd like, but it's up to the backend to find the relevant font. The naive user has no way of knowing which font actually got selected, and no way of knowing how to modify the parameters to get what they want (except if they stumble upon the way to specify the full path to a font file). Each backend can override the font-selection code, so e.g. the ps backend has an option to use only AFM fonts, meaning the core fonts built into PostScripts viewers. The subsetting system proposed in MEP14 (reading the font via FreeType, then rendering or outputting the outlines into the result) would make backends consistent with each other, as long as the same text engine is used. Then at least the OO API could have font selection as an explicit step, i.e. instead of ax.text(x, y, s, family='serif', style='oblique') you could write font = text_engine.find_font(family='serif', style='oblique') ax.text(x, y, s, font=font) and also query the `font` object for what actual font is being used. (Or would it look more like ax.text(x, y, text_engine.layout(s, font))?) If we want to continue support for backend idiosyncracies like ps.use_afm, I suppose those would need to be parameters to the text engine. The approach of subsetting fonts by writing a new Type-3 font in the PostScript or PDF file would allow supporting any fonts that FreeType can read, but this would lose hinting information in TTF and Type-1 fonts. I think we should at least leave open the possibility to embed the original font (or a directly-derived subset). The code that parses DVI files from TeX outputs not only glyphs but also boxes, which are black rectangles used to implement things like the underscore character and the varying-length part of the square-root sign. To support this, I guess TextSpan.get_chars should be able to return not only TextChar instances but also boxes. -- Jouni K. Seppänen http://www.iki.fi/jks -- See everything from the browser to the database with AppDynamics Get end-to-end visibility with application monitoring from AppDynamics Isolate bottlenecks and diagnose root cause in seconds. Start your free trial of AppDynamics Pro today! http://pubads.g.doubleclick.net/gampad/clk?id=48808831iu=/4140/ostg.clktrk ___ Matplotlib-devel mailing list Matplotlib-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/matplotlib-devel
Re: [matplotlib-devel] MEP14: Improve text handling
On 07/22/2013 01:05 PM, Jouni K. Seppänen wrote: The following message is a courtesy copy of an article that has been posted to gmane.comp.python.matplotlib.devel as well. Michael Droettboom mdroe-pfb3ainihtehxe+lvdl...@public.gmane.org writes: I've drafted a MEP with a plan to improve some of the text and font handling in matplotlib. I'd love any and all feedback. https://github.com/matplotlib/matplotlib/wiki/Mep14 I'm a bit late to the party Not too late -- the implementation has barely begun. , but here are a few thoughts: What I see as the biggest problem in the current font-selection system is its opaqueness. You can attempt to specify a style you'd like, but it's up to the backend to find the relevant font. The naive user has no way of knowing which font actually got selected, and no way of knowing how to modify the parameters to get what they want (except if they stumble upon the way to specify the full path to a font file). Each backend can override the font-selection code, so e.g. the ps backend has an option to use only AFM fonts, meaning the core fonts built into PostScripts viewers. Good point. I should add that in the MEP as an explicit example of another case where the font selection needs to be special-cased. The subsetting system proposed in MEP14 (reading the font via FreeType, then rendering or outputting the outlines into the result) would make backends consistent with each other, as long as the same text engine is used. Then at least the OO API could have font selection as an explicit step, i.e. instead of ax.text(x, y, s, family='serif', style='oblique') you could write font = text_engine.find_font(family='serif', style='oblique') ax.text(x, y, s, font=font) and also query the `font` object for what actual font is being used. (Or would it look more like ax.text(x, y, text_engine.layout(s, font))?) If we want to continue support for backend idiosyncracies like ps.use_afm, I suppose those would need to be parameters to the text engine. My thinking was that there would just be engine-specific attributes on the font selector, e.g.: font = FontProperties(family=serif, ps_family=Helvetica) and then you can pass this into the regular text engine or the PS AFM-specific one and they would both know what to do. But maybe that needs some more thinking. What I want to avoid is the current situation where things change radically when switching text engines because they *need* to handle fonts so differently. I'd rather make that more explicit -- because I don't think there's any way to make font selection work the same way across all of them. I think that's the assumption in the current design and it's not great. It works fine if you say give me a sans serif font, I don't care which, but beyond that, the user needs domain-specific knowldege. Honestly, this is a part of the MEP that I think needs work -- I basically threw up my hands as a solution to the problem. Maybe there is a better way. The approach of subsetting fonts by writing a new Type-3 font in the PostScript or PDF file would allow supporting any fonts that FreeType can read, but this would lose hinting information in TTF and Type-1 fonts. I think we should at least leave open the possibility to embed the original font (or a directly-derived subset). Yes. But that's not a change from current behavior. We've had subsetting fonts as Type 3 fonts for *years* and no one has complained about the lack of hinting. And we do provide an option to embed the entire original font if necessary (and no reason to remove that). What's new in the MEP is that the subsetting would be based on the freetype API (and thus be able to read virtually any font as input), rather than ttconv (which can only read well-behaved TrueType fonts with Macintosh metadata). This will allow us to support Microsoft-specific TTF fonts (the ones that ship with Windows 7 and 8), OpenType fonts and Web fonts. The code that parses DVI files from TeX outputs not only glyphs but also boxes, which are black rectangles used to implement things like the underscore character and the varying-length part of the square-root sign. To support this, I guess TextSpan.get_chars should be able to return not only TextChar instances but also boxes. Good point. We'll need that for the built-in mathtext renderer as well, for the same reason. Mike -- See everything from the browser to the database with AppDynamics Get end-to-end visibility with application monitoring from AppDynamics Isolate bottlenecks and diagnose root cause in seconds. Start your free trial of AppDynamics Pro today! http://pubads.g.doubleclick.net/gampad/clk?id=48808831iu=/4140/ostg.clktrk ___ Matplotlib-devel mailing list Matplotlib-devel@lists.sourceforge.net
Re: [matplotlib-devel] MEP14: Improve text handling
On Thu, May 30, 2013 at 11:29 PM, Nicolas Rougier nicolas.roug...@inria.fr wrote: I'm also concerned about the overhead of ctypes, given that there are already so many required optimizations in the matplotlib freetype wrapper to make it fast enough. But I'm willing to hold judgement on that until some measurements have been made. I would never have thought ctypes would be a problem for speed/optimization and I never benchmarked the freetype-py. Well, I see it this way -- for high performing Python code, you often need to vectorize operations one way or another. i.e. if you need to do a given operation on a bunch of numbers, objects, whatever, you need to be able to pass the collection in to lower-level code, so you dont have all the overhead of python funciton calls, dynamic typing, etc, inside your loop. Many (most) C libraries are not designed this way. So when writing python wrappers, you need to loop though a sequence in python, and call the underlying c function for each item. With ctypes, you write that code inPython, with cython, it's easy to write that code in cython, which gets compiled down to C -- you can get major performance benefits from this. And Cython is almost at easy to write as Python. How this applied to freetype, I don't know. 2) It's not Numpy-aware. For example, it loads image buffers into regular Python lists. This really should use Numpy for speed. you can do this with ctypes, and would work fine for image buffers, by many not as well as Cython for say, a large sequence of characters... -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/ORR(206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception chris.bar...@noaa.gov -- How ServiceNow helps IT people transform IT departments: 1. A cloud service to automate IT design, transition and operations 2. Dashboards that offer high-level views of enterprise services 3. A single system of record for all IT processes http://p.sf.net/sfu/servicenow-d2d-j ___ Matplotlib-devel mailing list Matplotlib-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/matplotlib-devel
Re: [matplotlib-devel] MEP14: Improve text handling
1) It's implemented in ctypes. I'm not much of a fan of ctypes, as it has the potential to segfault in nasty ways if the API changes in any way from what was expected (which would normally be caught at compile time in a C extension). I'm also concerned about the overhead of ctypes, given that there are already so many required optimizations in the matplotlib freetype wrapper to make it fast enough. But I'm willing to hold judgement on that until some measurements have been made. I would never have thought ctypes would be a problem for speed/optimization and I never benchmarked the freetype-py. Not sure how to do that though. 2) It's not Numpy-aware. For example, it loads image buffers into regular Python lists. This really should use Numpy for speed. Yes, and I recently discovered it may make things really slow in some cases. 3) It exposes the fixed point numbers to Python as integers -- it should really return all of these as floats -- the user shouldn't have to know or remember which values are 16.16 and which are 24.8 etc. It should just give floats. Double precision (with 52 bits in the mantissa) is enough for any of these 32-bit fixed-point values. I think that's just a remnant of older systems and needing to run on hardware without an FPU that doesn't need to be brought forward into the Python wrapper. You're right. I try to keep the very-low level to stick to the freetype implementation/type and the mid-level wrapper should use float everywhere (I may need to check that). This + your comment on Google App Engine makes me think that freetype-py might not be so useful in the end. Anyway, I would gladly (try to) contribute to the new system. Nicolas -- Get 100% visibility into Java/.NET code with AppDynamics Lite It's a free troubleshooting tool designed for production Get down to code-level detail for bottlenecks, with 2% overhead. Download for free and get started troubleshooting in minutes. http://p.sf.net/sfu/appdyn_d2d_ap2 ___ Matplotlib-devel mailing list Matplotlib-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/matplotlib-devel
Re: [matplotlib-devel] MEP14: Improve text handling
Paul Hobson wrote: On Thu, May 30, 2013 at 5:03 PM, Michael Droettboom md...@stsci.edu wrote: On 05/30/2013 02:27 PM, Chris Barker - NOAA Federal wrote: With a fully-function mathtex, it could be the default (only?) text layout system for MPL, simplifying things quite a bit. I'm not sure that's realistic. The usetex backend gets a great deal of use, and I don't think it's only because it handles multiline text better -- it's also the easiest way to make the text match that of a larger TeX document in which it's included (though the new PGF backend goes some way to helping that in an entirely different way). Exactly! I like that I can set text.usetex=True and add \usepackage{fourier} and I *know* that my figures and document will look the same. That said, I've never been able to get the PGF backend to work well. Random elements are pixelated. It's surely user-error on my end, but the usetex is comparatively easy to set up. It might be worth collating a list of reasons that users are using usetex to include in the MEP -- if we can address them all in another way, great, but if not it's not too difficult to keep something that already works fairly well working. The problem I have with it is not really that it exists, only that it has tendrils all throughout matplotlib that could be better localized into a single set of modules. As I state above -- I absolutely require One Font throughout my documents. If it's a serif font, I use the fourier TeX package. If it's a sans-serif font, I do the weird \sansmath voodoo (I still owe you a PR with an example of setting that up). Point is, it works well. Cheers, -paul I had the impression that XeTeX had stopped development and that LuaTeX was the path forward. -- Get 100% visibility into Java/.NET code with AppDynamics Lite It's a free troubleshooting tool designed for production Get down to code-level detail for bottlenecks, with 2% overhead. Download for free and get started troubleshooting in minutes. http://p.sf.net/sfu/appdyn_d2d_ap2 ___ Matplotlib-devel mailing list Matplotlib-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/matplotlib-devel
Re: [matplotlib-devel] MEP14: Improve text handling
On Thu, May 30, 2013 at 8:59 AM, Michael Droettboom md...@stsci.edu wrote: I've drafted a MEP with a plan to improve some of the text and font handling in matplotlib. I'd love any and all feedback. nice writ-up and thanks for workign on this. One idea (alternative?) would be to put more effort into the mathtext renderer. TeX itself, of course does an outstanding job of laying out text, paragraphs, etc. I'm assuming that the core stuff is already in mathtext, so adding better support for regular old non-math text would be a less-than-huge deal. And we still wouldn't need the full how-to-split-pages and all that code for MPL. Not sure about properly handling unicode issues, though modern TeX does support unicode. With a fully-function mathtex, it could be the default (only?) text layout system for MPL, simplifying things quite a bit. ... just a thought. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/ORR(206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception chris.bar...@noaa.gov -- Introducing AppDynamics Lite, a free troubleshooting tool for Java/.NET Get 100% visibility into your production application - at no cost. Code-level diagnostics for performance bottlenecks with 2% overhead Download for free and get started troubleshooting in minutes. http://p.sf.net/sfu/appdyn_d2d_ap1 ___ Matplotlib-devel mailing list Matplotlib-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/matplotlib-devel
Re: [matplotlib-devel] MEP14: Improve text handling
For the free type wrapper, maybe the freetype-py may be of some help: http://code.google.com/p/freetype-py/ I did not wrap all the freetype library but it already allows a fair amount of font manipulation/rendering. For unicode/harfbuzz, I've found this example https://github.com/lxnt/ex-sdl-freetype-harfbuzz to be incredibly useful to understand the (poorly documented) library. The strong point of harfbuzz is to have no heavy dependencies (compared to pango for example). By the way, Behad is considering a refactoring of the library and it might be worth to interact with him (on the harfbuzz list) to see how this could ease a python wrapper (if you intend to use it of course). In the current draft, you're speaking of rich text but I found no reference for a possible markup (or equivalent) to specify the different font, color, boldness, etc. Nicolas On May 30, 2013, at 8:27 PM, Chris Barker - NOAA Federal chris.bar...@noaa.gov wrote: On Thu, May 30, 2013 at 8:59 AM, Michael Droettboom md...@stsci.edu wrote: I've drafted a MEP with a plan to improve some of the text and font handling in matplotlib. I'd love any and all feedback. nice writ-up and thanks for workign on this. One idea (alternative?) would be to put more effort into the mathtext renderer. TeX itself, of course does an outstanding job of laying out text, paragraphs, etc. I'm assuming that the core stuff is already in mathtext, so adding better support for regular old non-math text would be a less-than-huge deal. And we still wouldn't need the full how-to-split-pages and all that code for MPL. Not sure about properly handling unicode issues, though modern TeX does support unicode. With a fully-function mathtex, it could be the default (only?) text layout system for MPL, simplifying things quite a bit. ... just a thought. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/ORR(206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception chris.bar...@noaa.gov -- Introducing AppDynamics Lite, a free troubleshooting tool for Java/.NET Get 100% visibility into your production application - at no cost. Code-level diagnostics for performance bottlenecks with 2% overhead Download for free and get started troubleshooting in minutes. http://p.sf.net/sfu/appdyn_d2d_ap1 ___ Matplotlib-devel mailing list Matplotlib-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/matplotlib-devel -- Introducing AppDynamics Lite, a free troubleshooting tool for Java/.NET Get 100% visibility into your production application - at no cost. Code-level diagnostics for performance bottlenecks with 2% overhead Download for free and get started troubleshooting in minutes. http://p.sf.net/sfu/appdyn_d2d_ap1 ___ Matplotlib-devel mailing list Matplotlib-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/matplotlib-devel
Re: [matplotlib-devel] MEP14: Improve text handling
On 05/30/2013 02:27 PM, Chris Barker - NOAA Federal wrote: On Thu, May 30, 2013 at 8:59 AM, Michael Droettboom md...@stsci.edu wrote: I've drafted a MEP with a plan to improve some of the text and font handling in matplotlib. I'd love any and all feedback. nice writ-up and thanks for workign on this. One idea (alternative?) would be to put more effort into the mathtext renderer. TeX itself, of course does an outstanding job of laying out text, paragraphs, etc. I'm assuming that the core stuff is already in mathtext, so adding better support for regular old non-math text would be a less-than-huge deal. And we still wouldn't need the full how-to-split-pages and all that code for MPL. That's an interesting idea, that we should definitely ruminate on. That still doesn't address the Unicode issues, which are really complex to get right -- I'd really rather depend on something else for that. But what you suggest might be the best way forward to improve the built-in rendering for a good fraction of users that don't really care about Unicode. Not sure about properly handling unicode issues, though modern TeX does support unicode. Right -- and I think moving to XeTeX for the usetex backend, which is now pretty widely available, might be a good improvement on that front. I still don't want to reimplement all of that, if I can avoid it. With a fully-function mathtex, it could be the default (only?) text layout system for MPL, simplifying things quite a bit. I'm not sure that's realistic. The usetex backend gets a great deal of use, and I don't think it's only because it handles multiline text better -- it's also the easiest way to make the text match that of a larger TeX document in which it's included (though the new PGF backend goes some way to helping that in an entirely different way). It might be worth collating a list of reasons that users are using usetex to include in the MEP -- if we can address them all in another way, great, but if not it's not too difficult to keep something that already works fairly well working. The problem I have with it is not really that it exists, only that it has tendrils all throughout matplotlib that could be better localized into a single set of modules. ... just a thought. Thanks. Keep em coming! Mike -- Get 100% visibility into Java/.NET code with AppDynamics Lite It's a free troubleshooting tool designed for production Get down to code-level detail for bottlenecks, with 2% overhead. Download for free and get started troubleshooting in minutes. http://p.sf.net/sfu/appdyn_d2d_ap2 ___ Matplotlib-devel mailing list Matplotlib-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/matplotlib-devel
Re: [matplotlib-devel] MEP14: Improve text handling
On 05/30/2013 03:33 PM, Nicolas Rougier wrote: For the free type wrapper, maybe the freetype-py may be of some help: http://code.google.com/p/freetype-py/ I did not wrap all the freetype library but it already allows a fair amount of font manipulation/rendering. I looked at this a number of years ago, and just looked at it again today. I think in general it's a better approach than what we have now in matplotlib, in that it's a thin wrapper around freetype rather than a just enough to for what we need approach, which should make things more flexible in the long run. It's a lot like what I have in mind. However, I do have some concerns about it and I'd like to get a sense of your receptibility to these changes. 1) It's implemented in ctypes. I'm not much of a fan of ctypes, as it has the potential to segfault in nasty ways if the API changes in any way from what was expected (which would normally be caught at compile time in a C extension). I'm also concerned about the overhead of ctypes, given that there are already so many required optimizations in the matplotlib freetype wrapper to make it fast enough. But I'm willing to hold judgement on that until some measurements have been made. 2) It's not Numpy-aware. For example, it loads image buffers into regular Python lists. This really should use Numpy for speed. 3) It exposes the fixed point numbers to Python as integers -- it should really return all of these as floats -- the user shouldn't have to know or remember which values are 16.16 and which are 24.8 etc. It should just give floats. Double precision (with 52 bits in the mantissa) is enough for any of these 32-bit fixed-point values. I think that's just a remnant of older systems and needing to run on hardware without an FPU that doesn't need to be brought forward into the Python wrapper. 4) It should have another layer to handle the decoding of SFNT tables in a consistent manner. I know the sfnt-names.py example does this, but that should be built into the library. There are certain places where hiding the details of the underlying font file is a good thing -- and I think one of the reasons freetype doesn't do this is the lack of a standard Unicode type in C. We don't have that problem in Python. I think all of these are fixable by adding another layer on top, with the exception of (1) of course. Maybe it makes sense to build that intermediate layer, adapt matplotlib to it, benchmark the ctypes issue, and if necessary reimplement the core using C/API. For unicode/harfbuzz, I've found this example https://github.com/lxnt/ex-sdl-freetype-harfbuzz to be incredibly useful to understand the (poorly documented) library. The strong point of harfbuzz is to have no heavy dependencies (compared to pango for example). By the way, Behad is considering a refactoring of the library and it might be worth to interact with him (on the harfbuzz list) to see how this could ease a python wrapper (if you intend to use it of course). That example is very helpful. Thanks. I should add to the MEP, for those that are not aware, that even though Harfbuzz is a part of the Gtk/Gnome/Cairo ecosystem, it is a very standalone library itself, and is the closest to works everywhere with minimal requirements of any of the available options. I should definitely clarify that even though there are many options for font layout libraries, including both cross-platform/open source and closed-source-vendor ones, Harfbuzz could be the one to rule them all so we wouldn't necessarily need to wrap all of them. In the current draft, you're speaking of rich text but I found no reference for a possible markup (or equivalent) to specify the different font, color, boldness, etc. Yeah -- I need to make that more explicit. I think MEP14 needs to consider the *possibility* of adding rich text support down the line so that the API can support it, but the details of how we might actually do that should be postponed for another MEP. It's already a lot to bite off as it is. Does that make sense to you -- are there things in the proposed API that would inhibit that from being added in the future? Cheers, Mike Nicolas On May 30, 2013, at 8:27 PM, Chris Barker - NOAA Federal chris.bar...@noaa.gov wrote: On Thu, May 30, 2013 at 8:59 AM, Michael Droettboom md...@stsci.edu wrote: I've drafted a MEP with a plan to improve some of the text and font handling in matplotlib. I'd love any and all feedback. nice writ-up and thanks for workign on this. One idea (alternative?) would be to put more effort into the mathtext renderer. TeX itself, of course does an outstanding job of laying out text, paragraphs, etc. I'm assuming that the core stuff is already in mathtext, so adding better support for regular old non-math text would be a less-than-huge deal. And we still wouldn't need the full how-to-split-pages and all that code for MPL. Not sure
Re: [matplotlib-devel] MEP14: Improve text handling
On Thu, May 30, 2013 at 5:03 PM, Michael Droettboom md...@stsci.edu wrote: On 05/30/2013 02:27 PM, Chris Barker - NOAA Federal wrote: With a fully-function mathtex, it could be the default (only?) text layout system for MPL, simplifying things quite a bit. I'm not sure that's realistic. The usetex backend gets a great deal of use, and I don't think it's only because it handles multiline text better -- it's also the easiest way to make the text match that of a larger TeX document in which it's included (though the new PGF backend goes some way to helping that in an entirely different way). Exactly! I like that I can set text.usetex=True and add \usepackage{fourier} and I *know* that my figures and document will look the same. That said, I've never been able to get the PGF backend to work well. Random elements are pixelated. It's surely user-error on my end, but the usetex is comparatively easy to set up. It might be worth collating a list of reasons that users are using usetex to include in the MEP -- if we can address them all in another way, great, but if not it's not too difficult to keep something that already works fairly well working. The problem I have with it is not really that it exists, only that it has tendrils all throughout matplotlib that could be better localized into a single set of modules. As I state above -- I absolutely require One Font throughout my documents. If it's a serif font, I use the fourier TeX package. If it's a sans-serif font, I do the weird \sansmath voodoo (I still owe you a PR with an example of setting that up). Point is, it works well. Cheers, -paul -- Get 100% visibility into Java/.NET code with AppDynamics Lite It's a free troubleshooting tool designed for production Get down to code-level detail for bottlenecks, with 2% overhead. Download for free and get started troubleshooting in minutes. http://p.sf.net/sfu/appdyn_d2d_ap2___ Matplotlib-devel mailing list Matplotlib-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/matplotlib-devel