> Hm...Let me check with Andrew on whether it is an acceptable behavior for the > UDF.
Totally fine. Thanks! > On Jan 23, 2015, at 04:15, Ananth RK <anant...@ymxdata.com> wrote: > > Hi Christian, > > On Thu, Jan 22, 2015 at 10:48 PM, <christ...@quelltextlich.at > <mailto:christ...@quelltextlich.at>> wrote: > > > Okay. Assuming we append client_ip to X-Forwarded-For and process the list > > from right-to-left, do you still think we should explicitly check whether > > client_ip is a trusted proxy? Because, per the modified algorithm, the > > client_ip will be the first item to be processed and if it is not a trusted > > proxy, it will be returned as the result - which is what even the explicit > > initial check will result in this case as well. Correct? > > Right, if you implement the pseudo-code algorithm I sketched, you can > skip that step. > > But if you optimize and loop-unroll the first iteration (like your > implementation did), you want to check whether or not it's a trusted > proxy. > But whether one would label that loop-unrolling “premature > optimization” or “making the code more readable” depends a bit on the > concrete formulation in code. > > Okay. > > > > > The way you combine the looping and if looks like you might be > > > skipping across entries that you cannot parse. > > > But if you find invalid entries while backtracking the IPs, you're > > > probably in parts of the X-Forwarded-For value that you shouldn't trust. > > > > > (X-Forwarded-For can easily be spoofed by clients) > > > > We only check that each entry is a valid IPv4 or IPv6 address and not just > > a random string. > > > If we find invalid entries, then the only option we have > > is to continue the search further for a valid IP. > > Ex falso quodlibet. > > (Oh Latin! How much I missed you!) > > > Hence, the only option one has is to stop at untrusted values in the > chain of X-Forwarded-For IPs. > > And pick the last known good IP? > > > > > > list_of_ips = append client_ip to X-Forwarded-For > > > for ip in reverse( list_of_ips ) > > > do > > > if ip is not a trusted proxy or the iterator does not have more > > elements > > > then > > > return ip > > > fi > > > done > > > > > (That should give invalid IP addresses for some requests. That > > > seems to be the correct thing from my point of view. But if you rather > > > geolocate wrong than not geolocate at all, you can choose to pick the > > > last known good IP address instead. I guess that's more a matter of > > > taste.) > > > > What would picking the last known good IP address in this context? An IP > > that matches the IPv4/IPv6 pattern _AND_ is not a trusted proxy? > > As indicated above, I'd rather indicate “Unable to geolocate” than > knowingly geolocate wrong. > > But by the “last known good IP address” above I meant (in case of > faulty X-Forwarded-For), the last IP address that was not obviously > wrong could be picked. > > Let's have some examples. > For the examples, I am assuming that only 1.0.0.0/8 <http://1.0.0.0/8> are > trusted > proxies. > > * Case 1: > > Client IP: 2.0.0.1 > X-Forwarded-For: 1.0.0.2, 1.0.0.3 > > -> last known good IP address: “2.0.0.1” > > * Case 2: > > Client IP: 1.0.0.1 > X-Forwarded-For: 1.0.0.2, 2.0.0.1 > > -> last known good IP address: “2.0.0.1” > > * Case 3: > > Client IP: 1.0.0.1 > X-Forwarded-For: foo, 2.0.0.1 > > -> last known good IP address: “2.0.0.1” > > * Case 4: > > Client IP: 1.0.0.1 > X-Forwarded-For: 1.0.0.2, 1.0.0.3, foo > > -> last known good IP address: “1.0.0.1” > > * Case 5: > > Client IP: 1.0.0.1 > X-Forwarded-For: 1.0.0.2, foo, 1.0.0.3 > > -> last known good IP address: “1.0.0.3” > > * Case 6: > > Client IP: 1.0.0.1 > X-Forwarded-For: foo, 1.0.0.2, 1.0.0.3 > > -> last known good IP address: “1.0.0.2” > > * Case 7: > > Client IP: foo > > -> This should never happen. File a bug in phabricator. > > Thanks for the case-by-case explanation. It is *really* clear. > > > But let me repeat that I think “Unable to geolocate” is a valid and > good response. Better than using some proxy in the chain. > > Hm...Let me check with Andrew on whether it is an acceptable behavior for the > UDF. > > Thanks, > Ananth > > _______________________________________________ > Analytics mailing list > Analytics@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/analytics
_______________________________________________ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics