> I'm having an issue parsing the sub domain from a lot of non-US urls.
> In the US the format is always  subdomain.domain.sufix or
> domain.sufix.  Easy to parse.
> 
> In the uk for example though the format is
> 
> subdomain.domain.co.uk or
> domain.co.uk
> 
> My issue is that i'm parsing the domain.co.uk like us urls and its
> parsing the domain to be "co.uk" not domain.co.uk!
> 
> example:  getDomain('bbc.co.uk') returns 'co.uk'

I don't believe there is one, but it's not too hard to hack together:

   KNOWN_TLDS = set([
     'com', 'edu', 'gov', 'mil', 'net',
     'aero', 'biz', 'coop', 'info',
     'museum', 'name', 'pro',
     'local', 'localhost', 'invalid', 'test',
     ])

   def split_domain(domain):
     bits = domain.split('.')
     if bits[-1] in KNOWN_TLDS:
       return (
         '.'.join(bits[:-1]),
         bits[-1])
     else:
       return (
         '.'.join(bits[:-3]),
         '.'.join(bits[-2:]))

   if __name__ == "__main__":
     tests = (
       ('localhost.local', 'local'),
       ('example.com', 'com'),
       ('cs.example.edu', 'edu'),
       ('example.co.uk', 'co.uk'),
       ('example.tld.zw', 'tld.zw'),
       ('localhost', 'localhost'),
       ('', ''),
       )
     for test, result in tests:
       subdomain, tld = split_domain(test)
       print test, subdomain, tld, result
       assert tld == result

Adjust accordingly if you need.

-tim




--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to