It looks like many of the lines end with a carriage return, newline (\r\n), while the others end with only a newline. Is it possible the other tools are ignoring line ending differences?
-David Ian Sue Wing wrote: >Greetings, > >Yesterday I downloaded and installed a copy of CYGWIN. I am using the >uniq utility to purge duplicate line entries from a large, tab-delimited >file with several columns of data. (The file, which I have already run >through sort, is included as a .bz2 attachment. It has about 60,000 lines.) > >I have examined the file visually in a text editor, and confirmed that >it has duplicate lines. I have loaded the file into excel and calculated >that there are about 8700 duplicate lines. However, in the CYGWIN Bash >shell, typing > >uniq test_file_for_uniq > foo; diff test_file_for_uniq foo > >shows no changes between the files. Examining the uniquified file 'foo' >in excel reveals it to be identical to the original. > >I then fired up my trusty old MKS Toolkit and ran its implementation of >uniq. Running MKS visual diff on the original and uniquified files >identified about 8700 line differences, consistent with my earlier >calculations. > >Is this a bug in CYGWIN's implementation of uniq or a or a silly error >on my part? Last I checked, uniq was simple, straightforward to use, and >had nuclear-hardened reliability. > >-i > > > >------------------------------------------------------------------------ > >_______________________________________________ >Bug-coreutils mailing list >[email protected] >http://lists.gnu.org/mailman/listinfo/bug-coreutils > > -- --------------------------------------------------------- D a v i d E i s n e r c r a d l e @ u m d . e d u CALCE EPSC University of Maryland _______________________________________________ Bug-coreutils mailing list [email protected] http://lists.gnu.org/mailman/listinfo/bug-coreutils
