Hi, I am working through Nim track in exercism. The task to solve is: "Given a 
number, find the sum of all the unique multiples of particular numbers up to 
but not including that number." I am curious about the performance of HashSet. 
Here is the summary of running times for a range of input numbers; the size of 
the HashSet[int] was approx 7_000_000 items. I compiled the code (see below) 
using Nim Compiler Version 1.2.0 [Linux: amd64] with -d:release and -d:danger 
flags. I expected O(log N) performance, but this is not what happened: 
    
    
    CPU Time [8350000] 0.531s
    CPU Time [8400000] 0.444s
    CPU Time [8450000] 0.795s
    CPU Time [8500000] 1.828s
    CPU Time [8550000] 3.533s
    CPU Time [8600000] 5.789s
    CPU Time [8650000] 8.634s
    CPU Time [8700000] 12.119s
    CPU Time [8750000] 16.176s
    CPU Time [8800000] 20.857s
    CPU Time [8850000] 26.108s
    CPU Time [8900000] 32.016s
    CPU Time [8950000] 38.528s
    CPU Time [9000000] 45.560s
    
    
    Run

What is the reason for the decreasing performance?
    
    
    ## Given a number, find the sum of all the unique(!) multiples of 
particular numbers up to but not including that number.
    ## If we list all the natural numbers below 20 that are multiples of 3 or 
5, we get 3, 5, 6, 9, 10, 12, 15, and 18.
    ## The sum of these multiples is 78.
    import sets, intsets
    import times,strutils
    
    proc sumA*(x:int, numbers:seq[int]):int =
      var multiples:HashSet[int]
      #var multiples:IntSet
      for n in numbers:
        if n == 0:
          continue
        for m in countup(n, x-1, n):
          if not multiples.containsOrIncl(m):
            result += m
    
    template benchmark(benchmarkName: string, code: untyped) =
      block:
        let t0 = epochTime()
        code
        let elapsed = epochTime() - t0
        let elapsedStr = elapsed.formatFloat(format = ffDecimal, precision = 3)
        echo "CPU Time [", benchmarkName, "] ", elapsedStr, "s"
    
    when isMainModule:
      for x in countup(8_350_000, 9_000_000, 50000):
        benchmark $x:
          discard sumA(x, @[2, 3, 5, 7, 11])
    
    
    Run

Similar test with Python 3.6 code yields more reasonable running times: 
    
    
    8350000 cpu= 1.9215092658996582
    8400000 cpu= 1.8640367984771729
    8450000 cpu= 1.8687024116516113
    8500000 cpu= 1.867283821105957
    8550000 cpu= 1.8652291297912598
    8600000 cpu= 1.9187562465667725
    8650000 cpu= 1.8985896110534668
    8700000 cpu= 1.8902392387390137
    8750000 cpu= 1.8779652118682861
    8800000 cpu= 1.940896987915039
    8850000 cpu= 1.9335484504699707
    8900000 cpu= 1.9375865459442139
    8950000 cpu= 1.9178028106689453
    9000000 cpu= 1.9903161525726318
    
    
    Run

Here is the Python code 
    
    
    import time
    
    def sum(x, numbers):
        collected = set()
        result = 0
        for n in numbers:
            if n == 0:
                continue
            for m in range(n, x-1, n):
                if not m in collected:
                    result += m
                    collected.add(m)
        return result
    
    for x in range(8350000, 9050000, 50000):
        t0 = time.time()
        s= sum(x, [2,3,5,7,11])
        t1 = time.time()
        print(f"{x} cpu= {t1-t0}")
    
    
    Run

Reply via email to