Thanks for the detailed response. > That looks like (if X-Forwarded-For is “valid”) your jumping straight > to considering the X-Forwarded-For header. You might want to check > beforehand whether the client ip is a “trusted” proxy. And only if it > is, you should walk the X-Forwarded-For.
> Also, it sounds a bit like you're going through X-Forwarded-For from > left to right. Make sure to walk it from right to left, as proxies are > expected to append (not prepend) the client IP they forward for. Okay. Assuming we append client_ip to X-Forwarded-For and process the list from right-to-left, do you still think we should explicitly check whether client_ip is a trusted proxy? Because, per the modified algorithm, the client_ip will be the first item to be processed and if it is not a trusted proxy, it will be returned as the result - which is what even the explicit initial check will result in this case as well. Correct? > The way you combine the looping and if looks like you might be > skipping across entries that you cannot parse. > But if you find invalid entries while backtracking the IPs, you're > probably in parts of the X-Forwarded-For value that you shouldn't trust. > (X-Forwarded-For can easily be spoofed by clients) We only check that each entry is a valid IPv4 or IPv6 address and not just a random string. If we find invalid entries, then the only option we have is to continue the search further for a valid IP. > list_of_ips = append client_ip to X-Forwarded-For > for ip in reverse( list_of_ips ) > do > if ip is not a trusted proxy or the iterator does not have more elements > then > return ip > fi > done > (That should give invalid IP addresses for some requests. That > seems to be the correct thing from my point of view. But if you rather > geolocate wrong than not geolocate at all, you can choose to pick the > last known good IP address instead. I guess that's more a matter of > taste.) What would picking the last known good IP address in this context? An IP that matches the IPv4/IPv6 pattern _AND_ is not a trusted proxy? > Different stakeholders at the WMF use different lists of what kind of > proxies they consider. So things are not really clear cut. Compiling a common list - may be union of "General", "MediaWiki" and "Wikipedia Zero" - would be very useful. As Oliver suggests, it would be good to have them in site-wide config files. Thanks, Ananth
_______________________________________________ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics