> Op 22 aug. 2014, om 18:31 heeft Oleg Kalnichevski <[email protected]> het > volgende geschreven: > > On Fri, 2014-08-22 at 12:47 +0200, Dirk-Willem van Gulik wrote: >>>> Found that some of below are indeed able to hang the regex stack (e.g. # >>>> 2). However the more elaborate regex-es are blocked by: >>>> >>>> private final static Pattern WILDCARD_PATTERN = Pattern.compile( >>>> "^[a-z0-9\\-\\*]+(\\.[a-z0-9\\-]+){2,}$", Pattern.CASE_INSENSITIVE); >>>> .. >>>> WILDCARD_PATTERN.matcher(identity).matches() >>>> >>>> which we apply to the subjectAltName, CN, etc. So that is not too bad then >>>> - assuming that that regep does not let them through. Which is likely - as >>>> the only dangerous thing I see in there is a *. >>>> >>> >>> Thank you so much for your feedback. What I could do is validate both >>> the identity and the subjectAltName pattern by making sure they consist >>> of characters legal for domain names (alphanumeric, dash and asterisk in >>> case of subjectAltName) prior to doing regexp matching with them. >> >> Right - but I am wondering if that means we end up in a rear guard battle. >> As we then find IPv6 addresses containing ‚:’ and god knows what new TLDs >> may do 5+ years hence. >> > > 5+ is pretty much my retirement target ;-) > > Seriously, though, I would worry about UTF8 issues only once start > getting angry complaints from users. Right now I would rather be too > restrictive than too liberal. > >> Now *all* that is allowed are ‚*’ — and as far as I know - only in string >> (and not IPv4/IPv6) based entries. >> >> So perhaps it is an option to compare things from the TLD down with a very >> very simple loop. >> >> if (starts with a star) then >> @a = array of FQDN split on ‚.' >> @b = array of FQDN split on ‚.’ >> >> if not right lenghts - bail >> working from the topmost side working to last but one >> bail if not the same. >> check if we have left just one entry on a and a wildcard on b. >> >> i.e. avoid wildcards completely. > > Please correct me if I am wrong but after rereading relevant RFCs I was > under impression that complex wild card expressions in subjectAltName > like > > a*b*c*d.mydomain.com <http://d.mydomain.com/> > > were perfectly legal. This was the primary reason why I felt the use of > regex matching was beneficial. Should we revert to supporting simple > '*', 'blah*' expressions only?
Not sure - doing more research after reading the RFC’s - they are much more strict about i18n domains; and I am not sure if I understand all the implications. Dw.
