> I'm completely baffled by this and not entirely sure where to start. > > I have a plain text file, testfile.txt, which contains a single line: > > Very truly yours, > > It is written exactly how you see it above, with a newline at the end. > > I'm trying to write a script that will determine the number of words > in the file. A snippet of what I have thus far is the following: > > my $fh = new IO::File("$lvl2path/$filestng", "r") || > die ("Can't open .txt file named at $lvl2path. Exiting > program.\n\n"); > while (my $line = $fh->getline()) > { > my @words = split /\s+/, $line; > my %count = (); > $count{$line} += @words; > print "$line"; > print "The line above has " . scalar @words . " occurrences of > something.\n"; > } > $fh->close(); > > That outputs the following: > > V e r y t r u l y y o u r s , > The line above has 3 occurrences of something. > > I understand that spilt /\s+/ is matching whitespace characters, and > I'm pleased that it comes back with 3 (two spaces and the newline). > What I don't understand is why the output has spaces between all the > letters. I've looked at this and other .txt files in different > editors on different OS's; I can't find any hidden characters, > whitespace or other, anywhere they don't belong. What's really > concerning is when I change the above such that: > > my @words = split /\w+/, $line; > > I get this: > > V e r y t r u l y y o u r s , > The line above has 15 occurrences of something. > > Where is this whitespace coming from between the letters?? Is it > really whitespace (/\s+/ doesn't catch it, but /\w+/ is catching each > character as if there's whitespace between)?? A good part of my > dissertation hinges on being able to read thousands of .txt files > without the extraneous spaces that are being introduced somewhere. > > By the way, only some files appear affected, but there's no obvious > pattern. > > Any hints would be wildly appreciated.
Hi, I think you are making this all too complicated. All that is needed is the script below. If you have a file #!/usr/bin/perl use strict; while (<DATA>) { chomp; my @words = split / /; my $nr_words = @words; print "Number of words is $nr_words and the words are\n @words\n"; } __DATA__ Very truly yours, # perl beg1.pl Number of words is 3 and the words are Very truly yours, ========================================================= Using a file open statement you would do something like this (untested) #!/usr/bin/perl use strict; open (my $FILETEST, "<", " $lvl2path/$filestng") or die "can't open $lvl2path/$filestng for reading $!\n"; while (<$FILETEST>) { chomp; my @words = split / /; my $nr_words = @words; print "Number of words is $nr_words and the words are\n @words\n"; } But you might want to split on white space to cope with the occasions when there is more than one space between words. You have done something that puts a space between letters when printing out. -- Owen -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/