David Crisp <[email protected]> writes:
> name | value
> ==========
> ItemOne : 10
> ItemOne : 10
> ItemOne : 10
> ItemOne : 10
> ItemTwo : 20
> ItemTwo : 20
> ItemTwo : 20
> ItemTwo : 20
> ItemThree : 30
> ItemThree : 30
> ItemThree : 30
> ItemThree : 30
If you're confident there will frequently be duplicated lines, and you
want to ignore the duplicates, I'd recommend (on Unix) filtering the
list to remove them::
$ cat items | sort | uniq > items_dedup
Then you can read the ‘items_dedup’ file in your Python program.
You can even write your Python program as a filter (read the input lines
from ‘sys.stdin’, write the result to ‘sys.stdout’) and just hook it
into that command pipeline. If the program you're writing is named
‘do_more_processing’::
$ cat items | sort | uniq | do_more_processing > outputfile
--
\ “Science is a way of trying not to fool yourself. The first |
`\ principle is that you must not fool yourself, and you are the |
_o__) easiest person to fool.” —Richard P. Feynman, 1964 |
Ben Finney
_______________________________________________
melbourne-pug mailing list
[email protected]
https://mail.python.org/mailman/listinfo/melbourne-pug