First off, compiling with the command line option `-d:release` always speeds up 
Nim code. Still though it's expected that Nim without release mode is faster 
than Python so I blame the nre module. Beyond that, here's some things I 
noticed in your code.

Let's go through the `cut` iterator that your code uses.
    
    
    iterator cut*(sentence:string):string  =
        let blocks:seq[string] = filter(nre.split(sentence,re_han),proc(x: 
string): bool = x.len > 0)
        var
            tmp = newSeq[string]()
            wordStr:string
        for blk in blocks:
            if isSome(blk.match(re_han)) == true:
                for word in internal_cut(blk):
                    wordStr = $word
                    if (wordStr in Force_Split_Words == false):
                        yield wordStr
                    else:
                        for c in wordStr:
                            yield $c
            else:
                tmp = filter(split(blk,re_skip),proc(x: string): bool = x.len > 
0 or x.runeLen()>0)
                for x in tmp:
                  yield x
    
    
    Run

You use filter and then iterate through the result right after twice here. 
Converting iterators to seqs is pretty expensive, so it's best to do all you 
can in 1 iteration.
    
    
    iterator cut*(sentence: string): string =
      for blk in sentence.split(re_han):
        if blk.len == 0: continue
        if blk.match(re_han).isSome:
          for word in internal_cut(blk):
            let wordStr = $word
            if wordStr notin Force_Split_Words:
              yield wordStr
            else:
              for c in wordStr:
                yield $c
        else:
          for x in blk.split(re_skip):
            if x.len > 0 or x.runeLen > 0:
              yield x
    
    
    Run

This doesn't really improve performance, but I thought I'd include it anyway:
    
    
    proc lcut*(sentence:string):seq[string] =
      result = lc[y | (y <- cut(sentence)),string ]
    
    
    Run

There is already a template in system.nim (the default imported module) for 
this purpose named 
[accumulateResult](https://nim-lang.org/docs/system.html#accumulateResult.t,untyped).
 It's used like so:
    
    
    proc lcut*(sentence: string): seq[string] =
      accumulateResult(cut(sentence))
    
    
    Run

But accumulateResult is deprecated in the devel branch, to our luck you can use 
[sequtils.toSeq](https://nim-lang.org/docs/sequtils.html#toSeq.t,untyped) at 
your specific callsite:
    
    
    for line in lines:
      discard lcut(line).join("/")
    
    
    Run

turns to:
    
    
    # top of file
    from sequtils import toSeq
    
    for line in lines:
      discard toSeq(cut(line)).join("/")
    
    
    Run

This probably doesn't have much to do with the slowness, but you can optimize 
Table objects with char keys. Tables are currently implemented as a seq of 
`tuple[hash, key, value]`, and since for chars key is the same thing as hash it 
would use 8 bytes more memory per entry. This might be optimized in a future 
version of nim, but for now this works:
    
    
    proc getFromCharTable[V](charTable: openarray[(char, V)], key: char): V =
      for it in charTable:
        if it[0] == key:
          return it[1]
    
    let foo = {'A': 1, 'B': 2} # the type is an array of (char, int)
    echo foo.getFromCharTable('B') # 2
    
    
    Run

Reply via email to