On Mon, Jan 4, 2016 at 8:47 PM, Jesse Gross <[email protected]> wrote:
> On Mon, Jan 4, 2016 at 1:41 PM, Flavio Fernandes <[email protected]> > wrote: > > So, I'm a happy camper, but can't help but worry a little about the > > fragility of the > > system when one attempts to use a port type internal 'directly' as > bridged. > > The fix > > I have in mind is relatively simple: add a check in > internal_dev_get_stats > > to gracefully handle cases when ovs_internal_dev_get_vport returns null. > Too > > simple? > > I don't think that the problem is simply that we are returning NULL > from ovs_internal_dev_get_vport(). ovs_internal_dev_get_vport() should > never return NULL to internal_dev_get_stats() because it is checking > whether the device has a ops structure that is equal to the one that > leads to internal_dev_get_stats(). And in fact, if you look at the > full stack trace, the address being dereferenced is 0x0000000000000060 > rather than 0x0 from a real NULL. > ack. If ovs_internal_dev_get_vport <http://lxr.oss.org.cn/ident?i=ovs_internal_dev_get_vport>() is not returning NULL then this is not as simple as what I was interpreting. My thinking was that 0x60 is the offset of &vport <http://lxr.oss.org.cn/ident?i=vport>->err_stats.rx_errors from line 306 in http://lxr.oss.org.cn/source/net/openvswitch/vport.c#L306 but you may be right in that if vport was not NULL, then this is an issue in what ovs_internal_dev_get_vport() is returning. > This looks like something is overwriting the vport pointer in the > device structure. If you follow where this is coming from you'll wind > up at ovs_netdev_get_vport() which is a maze of twisty conditions that > depend on what kernel version you are using. Particularly on the RHEL > kernels (which based on your email address I'm guessing you're using), > the pointer is stashed in a variety of places. My guess is that these > are not entirely safe in some conditions - likely related to tap > devices based on your other description. I think the best path forward > is to try to see which of the conditions your kernel version falls > into and try to see what might be stomping on the pointer. > I see. So it could be I'm looking at the wrong source code. I am using Centos 7.2 kernel (3.10.0-327.3.1.el7.x86_64 x86_64); I will find out more about how that differs from upstream kernel. THANKS Jesse! -- flaviof
_______________________________________________ discuss mailing list [email protected] http://openvswitch.org/mailman/listinfo/discuss
