How about something like this (it's my first try, but it seems to
work)...

#############################

use strict;
use warnings;

my @domains = qw(www.x.com x.com www.sandisk.com network.tv funny.co.jp
johnson.pictures.geography.info);

foreach(sort @domains){
        if($_ =~
/([a-zA-Z0-9\-.]*?)([a-zA-Z0-9\-]+\.(co\.\w{2}|com|net|edu|gov|info|tv))
$/){
                my $host = $1;
                my $domain = $2;
                $host = "No host " unless $host;
                print "$host => $domain\n";
        }else{
                print "Error: domain name format incorrect!\n";
        }
}


#############################

The regex gets a little convoluted, so I used YAPE::Regex::Explain to
sort it out:

The regular expression:

(?-imsx:/([a-zA-Z0-9\-.]*?)([a-zA-Z0-9\-]+\.(co\.\w{2}|com|net|edu|gov|i
nfo|tv))$/)

matches as follows:
  
NODE                     EXPLANATION
----------------------------------------------------------------------
(?-imsx:                 group, but do not capture (case-sensitive)
                         (with ^ and $ matching normally) (with . not
                         matching \n) (matching whitespace and #
                         normally):
----------------------------------------------------------------------
  /                        '/'
----------------------------------------------------------------------
  (                        group and capture to \1:
----------------------------------------------------------------------
    [a-zA-Z0-9\-.]*?         any character of: 'a' to 'z', 'A' to
                             'Z', '0' to '9', '\-', '.' (0 or more
                             times (matching the least amount
                             possible))
----------------------------------------------------------------------
  )                        end of \1
----------------------------------------------------------------------
  (                        group and capture to \2:
----------------------------------------------------------------------
    [a-zA-Z0-9\-]+           any character of: 'a' to 'z', 'A' to
                             'Z', '0' to '9', '\-' (1 or more times
                             (matching the most amount possible))
----------------------------------------------------------------------
    \.                       '.'
----------------------------------------------------------------------
    (                        group and capture to \3:
----------------------------------------------------------------------
      co                       'co'
----------------------------------------------------------------------
      \.                       '.'
----------------------------------------------------------------------
      \w{2}                    word characters (a-z, A-Z, 0-9, _) (2
                               times)
----------------------------------------------------------------------
     |                        OR
----------------------------------------------------------------------
      com                      'com'
----------------------------------------------------------------------
     |                        OR
----------------------------------------------------------------------
      net                      'net'
----------------------------------------------------------------------
     |                        OR
----------------------------------------------------------------------
      edu                      'edu'
----------------------------------------------------------------------
     |                        OR
----------------------------------------------------------------------
      gov                      'gov'
----------------------------------------------------------------------
     |                        OR
----------------------------------------------------------------------
      info                     'info'
----------------------------------------------------------------------
     |                        OR
----------------------------------------------------------------------
      tv                       'tv'
----------------------------------------------------------------------
    )                        end of \3
----------------------------------------------------------------------
  )                        end of \2
----------------------------------------------------------------------
  $                        before an optional \n, and the end of the
                           string
----------------------------------------------------------------------
  /                        '/'
----------------------------------------------------------------------
)                        end of grouping
----------------------------------------------------------------------
 

--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to