On Fri, May 30, 2008 at 4:45 AM, Edward K. Ream <[EMAIL PROTECTED]> wrote:
> 5859 0.084 0.000 0.165 0.000 leoAtFile.py: > 1079(readEndNode) > 43 0.531 0.012 6.564 0.153 leoAtFile.py: > 744(scanText4) Judging by the cumulative time taken by scanText4, it does seem rewriting this to use mxTextTools might help. Some easier optimizations you may also want to do to this func: while at.errors == 0 and not at.done: s = at.readLine(theFile) self.lineNumber += 1 if len(s) == 0: break kind = at.sentinelKind4(s) # g.trace(at.sentinelName(kind),s.strip()) if kind == at.noSentinel: i = 0 else: i = at.skipSentinelStart4(s,0) func = at.dispatch_dict[kind] func(s,i) Store all the attributes to local variables skind = at.sentinelKind noSent = at.noSentinel skips = at.skipSentinelStart4 ddict = at.dispatch_dict I suppose the same could be done for lineNumber. I suppose the repeated toUnicode stuff in readLine slows it down too. Perhaps there should be "short circuit" for files that are plain ascii. It may also be significantly faster to convert the whole file in one swoop, as opposed to doing it line-by-line. -- Ville M. Vainio - vivainio.googlepages.com blog=360.yahoo.com/villevainio - g[mail | talk]='vivainio'