>>>>> "Edin" == Edin Salkovi§ <[EMAIL PROTECTED]> writes:
Edin> Hi all, Is it that the code in the mathtext module looks Edin> ugly or is it just me not understanding it? Also, if anyone Edin> has some good online sources about parsing etc. on the net, Edin> I vwould realy appreciate it. It's probably you not understanding it :-) In my opinion, the code is pretty nice and modular, with a few exceptions, but I'm biased. Parsers can be a little hard to understand at first. You might start by trying to understand pyparsing http://pyparsing.wikispaces.com and work through some of the basic examples there. Once you have your head wrapped around that, it will get easier. Edin> Considering the foowing code (picked on random, from Edin> mathtext.py) Edin> I don't understand, for example, what does the statement: Edin> expression.parseString( s ) Edin> do? Edin> "expression" is defined globaly, and is called (that is - Edin> its method) only once in the above definition of the Edin> function, but I don't understand - what does that particular Edin> line do?!? It's not defined globally, but at module level. There is only one expression that represents a TeX math expression (at least as far as mathtext is concerned) so it is right that there is only one of them at module level. It's like saying "a name is a first name followed by an optional middle initial followed by a last name". You only need to define this one, and then you set handlers to handle the different components. The expression assigns subexpressions to handlers. The statement below says that an expression is one or more of a space, font element, an accent, a symbol, a subscript, etc... expression = OneOrMore( space ^ font ^ accent ^ symbol ^ subscript ^ superscript ^ subsuperscript ^ group ^ composite ).setParseAction(handler.expression).setName("expression") A subscript, for example, is a symbol group followed by an underscore followed by a symbol group subscript << Group( Optional(symgroup) + Literal('_') + symgroup ) and the handler is defined as subscript = Forward().setParseAction(handler.subscript).setName("subscript") which means that the function handler.subscript will be called every time the pattern is matched. The tokens will be the first symbol group, the underscore, and the second symbol group. Here is the implementation of that function def subscript(self, s, loc, toks): assert(len(toks)==1) #print 'subsup', toks if len(toks[0])==2: under, next = toks[0] prev = SpaceElement(0) else: prev, under, next = toks[0] if self.is_overunder(prev): prev.neighbors['below'] = next else: prev.neighbors['subscript'] = next return loc, [prev] This grabs the tokens and assigns them to the names "prev" and "next". Every element in the TeX expression is a special case of an Element, and every Element has a dictionary mapping surrounding elements to relative locations, either above or below or right or superscript or subscript. The rest of this function takes the "next" element, and assigns it either below (eg for \Sum_\0) or subscript (eg for x_0) and the layout engine will then take this big tree and lay it out. See for example the "set_origin" function? Edin> ------ Regarding the unicode s upport in mathtext, mathtext Edin> currently uses the folowing dictionary for getting the glyph Edin> info out of the font files: Edin> latex_to_bakoma = { Edin> r'\oint' : ('cmex10', 45), r'\bigodot' : ('cmex10', 50), Edin> r'\bigoplus' : ('cmex10', 55), r'\bigotimes' : ('cmex10', Edin> 59), r'\sum' : ('cmex10', 51), r'\prod' : ('cmex10', 24), Edin> ... Edin> } Edin> I managed to build the following dictionary(little more left Edin> to be done): tex_to_unicode = { r'\S' : u'\u00a7', r'\P' : Edin> u'\u00b6', r'\Gamma' : u'\u0393', r'\Delta' : u'\u0394', Edin> r'\Theta' : u'\u0398', r'\Lambda' : u'\u039b', r'\Xi' : Edin> u'\u039e', r'\Pi' : u'\u03a0', r'\Sigma' : u'\u03a3', Edin> unicode_to_tex is straight forward. Am I on the right Edin> track? What should I do next? Yes, this looks like the right approach. Once you have this dictionary mostly working, you will need to try and make it work with a set of unicode fonts. So instead of having the tex symbol point to a file name and glyph index, you will need to parse a set of unicode fonts to see which unicode symbols they provide and build a mapping from unicode name -> file, glyph index. Then when you encounter a tex symbol, you can use your tex_to_unicode dict combined with your unicode -> filename, glyphindex dict to get the desired glyph. Edin> I also noticed that some TeX commands (commands in the sense Edin> that they can have arguments enclosed in brackets {}) are Edin> defined as only symbols: \sqrt alone, for example, displays Edin> just the begining of the square root:º, and \sqrt{123} Edin> triggers an error. We don't have support for \sqrt{123} because we would need to do something a little fancier (draw the horizontal line over 123). This is doable and would be nice. To implement it, one approach would be add some basic drawing functionality to the freetype module, eg to tell freetype to draw a line on it's bitmap. Another approach would simply be to grab the bitmap to freetype and pass it off to agg and use the agg renderer to decorate it. This is probably preferable. But I think this is a lower priority right now. JDH
_______________________________________________ Matplotlib-devel mailing list Matplotlib-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/matplotlib-devel