> -----Original Message-----
> From: Brandon Falk [mailto:bf...@gamozolabs.com]
> Sent: Thursday, April 23, 2015 12:39 AM
> To: Antoine Kaufmann
> Cc: e1000-devel@lists.sourceforge.net
> Subject: Re: [E1000-devel] [non-linux] Cache Coherency Problems on X540?
> 
> Haha, as you posted this I was just doing testing on no-snoop. By setting
> CTRL_EXT.NS_DIS (no-snoop disable), I no longer have the problem.
> 
> Well that was a fun night! At least I found a real fix instead of a some
> cross-my-fingers kludge.
> 
> I find it quite offputting that the X540 documentation says in many
> places, and in one case in the DCA Registers descriptions: "In most
> applications non snoop should not be enabled." I find it strange that a
> setting that Intel knows themselves is not common is on by default.

I'm glad you found the source of the problem.  I'm not sure of the original 
reasoning for making snoop off by default for the X540 - I'm a SW engineer not 
a HW engineer :^), but at least the solution you found does make sense.

Thanks and regards,

- Greg

> 
> Nevertheless, thanks for all the help :P
> 
> -B
> 
> On Thu, Apr 23, 2015 at 3:11 AM, Antoine Kaufmann <
> antoi...@cs.washington.edu> wrote:
> 
> > As a follow-up on my earlier email about the no snoop setting: If you
> > haven't checked that out yet you probably should. (c.f. last post in
> >   https://software.intel.com/en-us/forums/topic/401498)
> >
> > I think that what you describe below might point to that. From a quick
> > look at the 82574L spec it looks like no snoop is disabled by default
> > on that card (i.e. the respective PCIe transactions will be snooped by
> > the cache), while the x540 spec seems to indicate that no snoop is
> > enabled by default (which was also my experience with the 82599).
> >
> > Not quite sure though about the behavior with mfences though. In any
> > case I would rule out the no snoop issue (basically involves setting
> > one bit, iirc, and I think I found that in the FreeBSD ixgbe driver).
> >
> > On Thu, Apr 23 02:53, Brandon Falk wrote:
> > > This is verrrrry strange indeed. The mfence works but the lfence
> > > does
> > not.
> > > On top of these I have tried *many* other different operations which
> > > I thought may have an effect. You can find the entire code snippit
> > > for
> > these
> > > tests at the end of this email. When the code was being tested each
> > > 'section' was individually uncommented and then the result of that
> > > test
> > is
> > > placed in a comment above the test.
> > >
> > > After doing these tests and scratching my head, I decided to do a
> > > test
> > with
> > > my old 82574L driver. I removed the X540 from the test machine and
> > > in the same PCIe slot placed in the 82574L card which I have
> > > previously used
> > (also
> > > using the same ethernet connection). Since the descriptor format for
> > legacy
> > > descriptors is identical in the X540 and 82574L, I used the exact
> > > same
> > code
> > > (posted below) as I used on the X540. On this card *no fences* were
> > needed.
> > > Indicating that either A. I'm initialing the X540 in a manner that
> > somehow
> > > makes this behaviour possible. B. the X540 (or maybe my specific
> > > one)
> > has a
> > > bug that is causing a need force these fences. C. Maybe it's not a
> > > bug
> > but
> > > something that needs to be documented. I still find it very strange
> > > that
> > a
> > > write fence changes how things operate when no writes are actually
> > > being done where I'm fencing.
> > >
> > > Some other tests I have done:
> > >
> > > - Have other processors spin and do mfences while the main core does
> > > not
> > do
> > > an mfence. This did not make it work, and this is expected as an
> > > mfence only should locally change behaviour.
> > > - mfences all over the X540 initialization and prior to doing the DD
> > > polling. Did not fix the problem.
> > >
> > > Some things I could think of that would cause this problem:
> > >
> > > - It's just a bug in the X540, or mine specifically. (If I'm not too
> > > lazy maybe I'll swap my X540s around between machines and try on
> another one).
> > > - It's a bug in my initialization, but I would find this strange as
> > > my 82574L driver initializes in almost an identical fashion.
> > > - It's a bug in my motherboard/CPU, but only on >=8x channel PCIe
> > > cards, which would explain why it didn't affect the 82574L.
> > >
> > > --------- Example code ---------------------
> > >
> > > ; Get the rx entry
> > > mov rdx, qword [gs:thread_local.x540_rx_ring_base]
> > > mov rax, qword [gs:thread_local.x540_rx_head] shl rax, 4
> > >
> > > ; XXX: Temporary, used for the loop counter for testing.
> > > xor ebp, ebp
> > >
> > > ; Putting an mfence/lfence/sfence here has no effect on the result.
> > >
> > > mov rsi, qword [rdx + rax + 0] ; pointer to packet contents
> > >
> > > ; Putting an mfence/lfence/sfence here has no effect on the result.
> > >
> > > .lewp:
> > > ; Putting an mfence/lfence/sfence here has no effect on the result.
> > >
> > > ; Wait until a packet is present here by polling the DD bit test
> > > dword [rdx + rax + 8 + 4], 1
> > > jz   short .lewp
> > >
> > > ; Without anything here, we fail with nothing getting printed.
> > >  ; Successful
> > > ;mfence
> > >
> > > ; Failure, never prints
> > > ;lfence
> > >
> > > ; Successful
> > > ;sfence
> > >
> > > ; Successful (the pushes and pops do not change the result, fails
> > > without ; rdtsc) ;push rax ;push rdx ;rdtsc ;pop rdx ;pop rax
> > >
> > > ; Failure, prints out 3 to the screen. Meaning we read the value 3
> > > times ; before it became accurate. On a second attempt it printed 3 as
> well.
> > > ;clflush [rsi + (udp_template_10g.ulen - udp_template_10g)]
> > >
> > > ; Successful
> > > ;wbinvd
> > >
> > > ; Failure, never prints
> > > ; mov rcx, 1024
> > > ;.simple_pause:
> > > ; dec rcx
> > > ; jnz short .simple_pause
> > >
> > > ; Failure, never prints
> > > ; mov rcx, 1024
> > > ;.do_some_reads:
> > > ; pop r15
> > > ; dec rcx
> > > ; jnz short .do_some_reads
> > >
> > > ; Failure, never prints
> > > ; mov rcx, 1024
> > > ;.do_some_writes:
> > > ; push r15
> > > ; dec rcx
> > > ; jnz short .do_some_writes
> > >
> > > ; Successful
> > > ;invlpg [rsi + (udp_template_10g.ulen - udp_template_10g)]  ;
> > > Failure, never prints ;prefetch [rsi + (udp_template_10g.ulen -
> > > udp_template_10g)]
> > >
> > > ; Failure, never prints
> > > ;lock inc qword [rsp]
> > >
> > > ; Spinloop and keep track of how many spins we have done in ebp. We
> > > spin ; until the packet indicates a UDP length of 0x14 bytes.
> > > .spin:
> > > inc ebp
> > > cmp word [rsi + (udp_template_10g.ulen - udp_template_10g)], 0x1400
> > > jne short .spin
> > >
> > > ; Print out the value in ebp to the screen (count from the loop
> above).
> > > mov  edx, ebp
> > > call outhexq
> > >
> > > cli
> > > hlt
> >
> >
> > --
> > Antoine Kaufmann
> > <antoi...@cs.washington.edu>
> >
------------------------------------------------------------------------------
BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
Develop your own process in accordance with the BPMN 2 standard
Learn Process modeling best practices with Bonita BPM through live exercises
http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- event?utm_
source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit 
http://communities.intel.com/community/wired

Reply via email to