Hi, I have mydiff.py to compare two files using two sets. my python version is
as follows but nim version takes five times longer. Can anyone explain the
performance difference or give me a better nim code [I am comparing about two
million lines of files: dba01:/oracledba/jlee/python_scripts>time mydiff.py
check_diff_dsdan17.lst check_diff_dsosu27.lst > /dev/null
real 0m8.274s user 0m4.661s sys 0m0.779s
dba01:/oracledba/jlee/python_scripts>time mydiffn (nim) check_diff_dsdan17.lst
check_diff_dsosu27.lst > /dev/null
real 0m39.135s user 0m25.623s sys 0m1.711s
mydiff.py:import sys
if len(sys.argv) != 3:
print "Usage: %s file1 file2" % sys.argv[0] sys.exit(1)
try:
file1 = open(sys.argv[1]) file2 = open(sys.argv[2])
except IOError, e:
print "file not found: %s" % e sys.exit(2)
old_lines = file1.read().split('n') new_lines = file2.read().split('n')
file1.close() file2.close()
old_lines_set = set(old_lines) new_lines_set = set(new_lines)
old_added = old_lines_set - new_lines_set old_removed = new_lines_set -
old_lines_set
for line in old_lines:
if line in old_added:
print '-', line.strip()
elif line in old_removed:
print '+', line.strip()
for line in new_lines:
if line in old_added:
print '-', line.strip()
elif line in old_removed:
print '+', line.strip()
mydiff.nim:
import os, sets, strutils
if paramCount() != 2:
echo "mydiff file1 file2" quit()
let file1 = open(paramStr(1)) let file2 = open(paramStr(2))
let old_lines = file1.readAll().splitLines() let new_lines =
file2.readAll().splitLines() file1.close() file2.close()
let old_lines_set = toSet(old_lines) let new_lines_set = toSet(new_lines)
let old_added = old_lines_set - new_lines_set let old_removed = new_lines_set -
old_lines_set
for line in old_lines:
if line in old_added:
echo "-", line.strip()
elif line in old_removed:
echo "+", line.strip()
for line in new_lines:
if line in old_added:
echo "-", line.strip()
elif line in old_removed:
echo "+", line.strip()
thanks Joseph