#927 <https://github.com/leo-editor/leo-editor/issues/924> performance 
questions has revealed some spectacular performance gains. The new code is 
in the "perf" branch.

Simply inlining g.isUnicode inside g.toUnicode makes a huge difference.  
The new code is:

if isPython3:
    def isUnicode(s):
        return isinstance(s, str)
else:
    def isUnicode(s):
        return isinstance(s, types.UnicodeType)

# This inlining makes a huge difference.
# It saves most calls to _toUnicode and g.isUnicode!

if isPython3:
    def toUnicode(s, encoding='utf-8', reportErrors=False):
        '''Convert a non-unicode string with the given encoding to 
unicode.'''
        return s if isinstance(s, str) else _toUnicode(s, encoding, 
reportErrors)
else:
    def toUnicode(s, encoding='utf-8', reportErrors=False):
        '''Convert a non-unicode string with the given encoding to 
unicode.'''
        return s if isinstance(s, types.UnicodeType) else _toUnicode(s, 
encoding, reportErrors)
            
def _toUnicode(s, encoding, reportErrors):
    
    if not encoding:
        encoding = 'utf-8'
    #
    # These are the only significant calls to s.decode in Leo.
    # Tracing these calls directly yields thousands of calls.
    try:
        s = s.decode(encoding, 'strict')
    except (UnicodeDecodeError, UnicodeError):
        # https://wiki.python.org/moin/UnicodeDecodeError
        s = s.decode(encoding, 'replace')
        if reportErrors:
            g.trace(g.callers())
            g.error("toUnicode: Error converting %s...from %s encoding to 
unicode" % (
                s[: 200], encoding))
    except AttributeError:
        # May be a QString.
        s = g.u(s)
    return s

*Before stats for g.isUnicode*

 145181 g.isUnicode:__init__,read_words,add_expanded_line,toUnicode
  66369 g.isUnicode:__get_h,headString,headString,toUnicode
  52902 g.isUnicode:get_UNL,__get_h,headString,headString
  11400 g.isUnicode:munge,os_path_normpath,toUnicodeFileEncoding,toUnicode
   8924 g.isUnicode:shortFileName,os_path_basename,toUnicodeFileEncoding,
toUnicode
   7744 g.isUnicode:get_directives_dict,__get_b,bodyString,bodyString
   7488 g.isUnicode:<listcomp>,os_path_expanduser,toUnicodeFileEncoding,
toUnicode
   7449 g.isUnicode:anyAtFileNodeName,findAtFileName,headString,toUnicode
   6986 g.isUnicode:v_element_visitor,v_element_visitor,v_element_visitor,
v_element_visitor
   6539 g.isUnicode:os_path_finalize_join,<listcomp>,
os_path_expandExpression,toUnicode
   5643 g.isUnicode:get_directives_dict,__get_h,headString,headString
   4841 g.isUnicode:isAnyAtFileNode,isAnyAtFileNode,headString,toUnicode
   4797 g.isUnicode:idle_check_commander,isAnyAtFileNode,
isAnyAtFileNode,headString
   4580 g.isUnicode:os_path_finalize_join,os_path_join,
toUnicodeFileEncoding,toUnicode
   4565 g.isUnicode:anyAtFileNodeName,findAtFileName,skip_id,toUnicode
   4044 g.isUnicode:anyAtFileNodeName,anyAtFileNodeName,findAtFileName,
headString
   3405 g.isUnicode:isAnyAtFileNode,anyAtFileNodeName,findAtFileName,
headString
   2489 g.isUnicode:createAllButtons,__get_h,headString,headString
   2280 g.isUnicode:parse,parse,feed,characters
   2054 g.isUnicode:visitNode,__get_h,headString,headString
   1866 g.isUnicode:putLine,putCodeLine,onl,os
   1747 g.isUnicode:visitNode,parseHeadline,skip_id,toUnicode
   1715 g.isUnicode:putBody,putLine,putCodeLine,os
   1302 g.isUnicode:parseHeadline,__get_h,headString,headString

*After stats for g.isUnicode*

  52902 g.isUnicode:get_UNL,__get_h,headString,headString
   7459 g.isUnicode:get_directives_dict,__get_b,bodyString,bodyString
   6986 g.isUnicode:v_element_visitor,v_element_visitor,
v_element_visitor,v_element_visitor
   5423 g.isUnicode:get_directives_dict,__get_h,headString,headString
   3919 g.isUnicode:anyAtFileNodeName,anyAtFileNodeName,findAtFileName,
headString
   2489 g.isUnicode:createAllButtons,__get_h,headString,headString
   2280 g.isUnicode:parse,parse,feed,characters
   2132 g.isUnicode:idle_check_commander,isAnyAtFileNode,
isAnyAtFileNode,headString
   2054 g.isUnicode:visitNode,__get_h,headString,headString
   1866 g.isUnicode:putLine,putCodeLine,onl,os
   1715 g.isUnicode:putBody,putLine,putCodeLine,os
   1550 g.isUnicode:isAnyAtFileNode,anyAtFileNodeName,findAtFileName,
headString
   1302 g.isUnicode:parseHeadline,__get_h,headString,headString

*After stats for g._toUnicode*

     10 g._toUnicode:read,openFileForReading,readFileToUnicode,toUnicode
      7 g._toUnicode:readAll,readOneAtCleanNode,read_at_clean_lines,
toUnicode
      6 g._toUnicode:openLeoFile,getLeoFile,readFile,toUnicode
      5 g._toUnicode:read,read_into_root,readFileIntoString,toUnicode
      3 g._toUnicode:read_into_root,readFileIntoString,
getPythonEncodingFromString,toUnicode
      2 g._toUnicode:__init__,__init__,read_words,toUnicode
      1 g._toUnicode:createOutline,init_import,readFileIntoString,toUnicode

This is remarkable.  Next, I'll investigate why so many positions are being 
created.

Edward

-- 
You received this message because you are subscribed to the Google Groups 
"leo-editor" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/leo-editor.
For more options, visit https://groups.google.com/d/optout.

Reply via email to