> I'm having an issue parsing the sub domain from a lot of non-US urls.
> In the US the format is always subdomain.domain.sufix or
> domain.sufix. Easy to parse.
>
> In the uk for example though the format is
>
> subdomain.domain.co.uk or
> domain.co.uk
>
> My issue is that i'm parsing the domain.co.uk like us urls and its
> parsing the domain to be "co.uk" not domain.co.uk!
>
> example: getDomain('bbc.co.uk') returns 'co.uk'
I don't believe there is one, but it's not too hard to hack together:
KNOWN_TLDS = set([
'com', 'edu', 'gov', 'mil', 'net',
'aero', 'biz', 'coop', 'info',
'museum', 'name', 'pro',
'local', 'localhost', 'invalid', 'test',
])
def split_domain(domain):
bits = domain.split('.')
if bits[-1] in KNOWN_TLDS:
return (
'.'.join(bits[:-1]),
bits[-1])
else:
return (
'.'.join(bits[:-3]),
'.'.join(bits[-2:]))
if __name__ == "__main__":
tests = (
('localhost.local', 'local'),
('example.com', 'com'),
('cs.example.edu', 'edu'),
('example.co.uk', 'co.uk'),
('example.tld.zw', 'tld.zw'),
('localhost', 'localhost'),
('', ''),
)
for test, result in tests:
subdomain, tld = split_domain(test)
print test, subdomain, tld, result
assert tld == result
Adjust accordingly if you need.
-tim
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups
"Django users" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at
http://groups.google.com/group/django-users?hl=en
-~----------~----~----~----~------~----~------~--~---