On Monday 03 Feb 2003 6:10 am, civileme wrote:
> On Sunday 02 February 2003 12:46 am, magnet wrote:
> > I have a large text file containing thousands of url's, one per line, and
> > am trying to find a suitable utility that will strip out identical lines
> > and leave a condensed file. Can anyone suggest a good solution?
> > Thanks :)
>
> ---------------------------------------------------------------------------
> #!/usr/bin/env python
> import sys, os
> if len(sys.argv) <= 2:
> print "Usage is './duprem infile outfile"
> sys.exit(1)
> HOME=os.expanduser("~")
>
> infile=sys.argv[1]
> outfile=sys.argv[2]
> def userhome(filename):
> if string.find(HOME,filename)==0:
> return filename
> else:
> return HOME+filename
> infile=userhome(infile)
> outfile=userhome(outfile)
>
> Goodinput=os.system('[ -e infile ]')
> if Goodinput != 0:
> print "input file "+infile+" does not exist"
> sys.exit(2)
>
> input=open(infile,"r")
> output=open(outfile,"w")
>
> G=[]
>
> g=input.readline()
> while len(g) > 0:
> i=0
> for x in G:
> if x == g:
> i=1
> print "duplicate "+g+" removed"
> break
> if i == 0:
> G.append(g)
> g=input.readline()
> for x in G:
> output.write(x)
> output.close
> print "complete"
>
>
> -----------------------------------------------------------------
>
> Well put everything between the dashed lines into a text file called duprem
> in your user space, then chmod a+x duprem then call it by
>
> ./duprem (fileofurlswithduplicates) (outputfilecleanedofdups)
>
> Civileme
Wow... a reply from THE linux guru. I feel kinda humbled :)
Cheers m8. Will look into this later in the week on my day off and try to
learn something from it.
Hope the job hunting is going well for you and some company out there is smart
enough to utilise you skills soon.
--
magnet
Registered Linux User: 281659
Registered machines: 163839,163840,163841,163842,163843,163844
6xAthlon 1.2GHz all running some flavour of Mandrake.
"My home is over-run with penguins that like a warm environment!"
Want to buy your Pack or Services from MandrakeSoft?
Go to http://www.mandrakestore.com