Re: regexp not working past one character

PlagueMagazine Mon, 04 Feb 2008 03:01:05 -0800

On Feb 3, 8:36 pm, [EMAIL PROTECTED] (Chas. Owens) wrote:
> On Feb 3, 2008 12:07 PM,  <[EMAIL PROTECTED]> wrote:
>
> > On Feb 2, 11:10 pm, [EMAIL PROTECTED] (John W. Krahn) wrote:
> > > [EMAIL PROTECTED] wrote:
> > > > I have a program with a line like
>
> > > > while (<FILE>) {
> > > >      if (/stuff/i) {
> > > >           print;
> > > >      }
> > > > }
>
> > > > When I run the program, and I replace "stuff" with only one character,
> > > > like "d", it works exactly as I expect. But if instead of using "d", I
> > > > use "da" or "date" (which I know are in FILE, because it's a text file
> > > > I made) nothing prints on the screen. I've also tried to have it print
> > > > to another file, and that's turned out blank too.
>
> > > > What am I doing wrong?
>
> > > My guess would be that you are creating a UTF-16LE text file on Windows
> > > and trying to read it on your Mac?
>
> > > John
> > > --
> > > Perl isn't a toolbox, but a small machine shop where you
> > > can special-order certain sorts of tools at low cost and
> > > in short order.                            -- Larry Wall
>
> > John,
>
> > I am using a Mac. I don't know what kind of Text file it is, because
> > it's being created by Automator as a scrape of a website. If the
> > encoding is the problem, how do I work that?
>
> > Yes, I realize that the problem I'm describing is really weird and the
> > code I posted should work perfectly. So it's probably not my coding
> > but something to do with the system I'm on (OS X 10.4.11) or the file
> > I'm working with (a .txt).
>
> snip
>
> Eextensions don't matter in UNIX (and OS X is a UNIX).  Step one to
> determining what a file contains is running the file command against
> it like this:
>
> file foo.txt
>
> This will give you a better idea what you are working with.  The next
> step is looking at header of the HTML, it will probably tell you
> exactly what encoding is being used.  It should look something like
> this:
>
> <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
>
> In that case the file is encoded with UTF-8.


Here's what I got from file file.txt

file.txt: Big-endian UTF-16 Unicode English character data, with very
long lines, with CRLF, CR, LF line terminators

Does this explain why my regexp search wasn't working?


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/

Re: regexp not working past one character

Reply via email to