On Thu Jul 11 09:13:10 EDT 2013, [email protected] wrote:
> Hello,
>
> It seems f option of grep is buggy.
> or any limitations in using the RE?
>
> term% wc MD5dir
> 4584 9168 388756 MD5dir
> term% wc x
> 4582 4582 151206 x
> term% grep -f x MD5dir | wc
> 4580 9160 388463
> term%
> term% grep e54272690d513f8b2403568a7574b1ba MD5dir
> e54272690d513f8b2403568a7574b1ba /usr/arisawa/src/taskfs/Q/task.387.a/
> term% grep e54272690d513f8b2403568a7574b1ba x
> e54272690d513f8b2403568a7574b1ba
> term% grep -v -f x MD5dir
> 7b6d7ae369226b6d0195ac3fe4487ce7 /usr/arisawa/src/elnfs/WWW/
> d44d788ad1237311d8282bbabca65977
> /usr/arisawa/src/hg/python-2.5.1-ape/Modules/_ctypes/libffi/src/darwin/
> e54272690d513f8b2403568a7574b1ba /usr/arisawa/src/taskfs/Q/task.387.a/
> 84a0f83f5020f16d0b277e8b19407791 /usr/arisawa/src/trans
> term%
a trick i often use for many fixed strings is sort + uniq.
(internally, grep/comp.c:/^increment does O(n^2)
qsorts on the patterns) perhaps it could be used to
double-check.
to find the md5 hashes that only appear in one file or the other
(only the first field is considered by uniq),
cat x MD5dir | sort | uniq -c | sed '/^ *2 /d'
to count the fields that appear in both
cat x MD5dir | sort | uniq -c | grep '^ *2 ' | wc -l
or
... | awk '$1==2{n++}END{print n}'
can you find a smaller test case that has the same issue. this
should be fixed
- erik