> Hm...Let me check with Andrew on whether it is an acceptable behavior for the 
> UDF.

Totally fine.  Thanks!




> On Jan 23, 2015, at 04:15, Ananth RK <anant...@ymxdata.com> wrote:
> 
> Hi Christian,
> 
> On Thu, Jan 22, 2015 at 10:48 PM, <christ...@quelltextlich.at 
> <mailto:christ...@quelltextlich.at>> wrote:
> 
> > Okay. Assuming we append client_ip to X-Forwarded-For and process the list
> > from right-to-left, do you still think we should explicitly check whether
> > client_ip is a trusted proxy? Because, per the modified algorithm, the
> > client_ip will be the first item to be processed and if it is not a trusted
> > proxy, it will be returned as the result - which is what even the explicit
> > initial check will result in this case as well. Correct?
> 
> Right, if you implement the pseudo-code algorithm I sketched, you can
> skip that step.
> 
> But if you optimize and loop-unroll the first iteration (like your
> implementation did), you want to check whether or not it's a trusted
> proxy.
> But whether one would label that loop-unrolling “premature
> optimization” or “making the code more readable” depends a bit on the
> concrete formulation in code.
> 
> Okay.
> 
> 
> > > The way you combine the looping and if looks like you might be
> > > skipping across entries that you cannot parse.
> > > But if you find invalid entries while backtracking the IPs, you're
> > > probably in parts of the X-Forwarded-For value that you shouldn't trust.
> >
> > > (X-Forwarded-For can easily be spoofed by clients)
> >
> > We only check that each entry is a valid IPv4 or IPv6 address and not just
> > a random string.
> 
> > If we find invalid entries, then the only option we have
> > is to continue the search further for a valid IP.
> 
> Ex falso quodlibet.
> 
> (Oh Latin! How much I missed you!)
> 
> 
> Hence, the only option one has is to stop at untrusted values in the
> chain of X-Forwarded-For IPs.
> 
> And pick the last known good IP? 
> 
> 
> 
> > >  list_of_ips = append client_ip to X-Forwarded-For
> > >   for ip in reverse( list_of_ips )
> > >   do
> > >       if ip is not a trusted proxy or the iterator does not have more
> > elements
> > >       then
> > >           return ip
> > >       fi
> > >   done
> >
> > > (That should give invalid IP addresses for some requests. That
> > > seems to be the correct thing from my point of view. But if you rather
> > > geolocate wrong than not geolocate at all, you can choose to pick the
> > > last known good IP address instead. I guess that's more a matter of
> > > taste.)
> >
> > What would picking the last known good IP address in this context? An IP
> > that matches the IPv4/IPv6 pattern _AND_ is not a trusted proxy?
> 
> As indicated above, I'd rather indicate “Unable to geolocate” than
> knowingly geolocate wrong.
> 
> But by the “last known good IP address” above I meant (in case of
> faulty X-Forwarded-For), the last IP address that was not obviously
> wrong could be picked.
> 
> Let's have some examples.
> For the examples, I am assuming that only 1.0.0.0/8 <http://1.0.0.0/8> are 
> trusted
> proxies.
> 
> * Case 1:
> 
>   Client IP: 2.0.0.1
>   X-Forwarded-For: 1.0.0.2, 1.0.0.3
> 
>   -> last known good IP address: “2.0.0.1”
> 
> * Case 2:
> 
>   Client IP: 1.0.0.1
>   X-Forwarded-For: 1.0.0.2, 2.0.0.1
> 
>   -> last known good IP address: “2.0.0.1”
> 
> * Case 3:
> 
>   Client IP: 1.0.0.1
>   X-Forwarded-For: foo, 2.0.0.1
> 
>   -> last known good IP address: “2.0.0.1”
> 
> * Case 4:
> 
>   Client IP: 1.0.0.1
>   X-Forwarded-For: 1.0.0.2, 1.0.0.3, foo
> 
>   -> last known good IP address: “1.0.0.1”
> 
> * Case 5:
> 
>   Client IP: 1.0.0.1
>   X-Forwarded-For: 1.0.0.2, foo, 1.0.0.3
> 
>   -> last known good IP address: “1.0.0.3”
> 
> * Case 6:
> 
>   Client IP: 1.0.0.1
>   X-Forwarded-For: foo, 1.0.0.2, 1.0.0.3
> 
>   -> last known good IP address: “1.0.0.2”
> 
> * Case 7:
> 
>   Client IP: foo
> 
>   -> This should never happen. File a bug in phabricator.
> 
> Thanks for the case-by-case explanation. It is *really* clear.
>  
> 
> But let me repeat that I think “Unable to geolocate” is a valid and
> good response. Better than using some proxy in the chain.
> 
> Hm...Let me check with Andrew on whether it is an acceptable behavior for the 
> UDF.
> 
> Thanks,
> Ananth
> 
> _______________________________________________
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics

_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics

Reply via email to