Haha, as you posted this I was just doing testing on no-snoop. By setting
CTRL_EXT.NS_DIS (no-snoop disable), I no longer have the problem.
Well that was a fun night! At least I found a real fix instead of a some
cross-my-fingers kludge.
I find it quite offputting that the X540 documentation says in many places,
and in one case in the DCA Registers descriptions: "In most applications
non snoop should not be enabled." I find it strange that a setting that
Intel knows themselves is not common is on by default.
Nevertheless, thanks for all the help :P
-B
On Thu, Apr 23, 2015 at 3:11 AM, Antoine Kaufmann <
antoi...@cs.washington.edu> wrote:
> As a follow-up on my earlier email about the no snoop setting: If you
> haven't checked that out yet you probably should. (c.f. last post in
> https://software.intel.com/en-us/forums/topic/401498)
>
> I think that what you describe below might point to that. From a quick
> look at the 82574L spec it looks like no snoop is disabled by default on
> that card (i.e. the respective PCIe transactions will be snooped by the
> cache), while the x540 spec seems to indicate that no snoop is enabled
> by default (which was also my experience with the 82599).
>
> Not quite sure though about the behavior with mfences though. In any
> case I would rule out the no snoop issue (basically involves setting one
> bit, iirc, and I think I found that in the FreeBSD ixgbe driver).
>
> On Thu, Apr 23 02:53, Brandon Falk wrote:
> > This is verrrrry strange indeed. The mfence works but the lfence does
> not.
> > On top of these I have tried *many* other different operations which I
> > thought may have an effect. You can find the entire code snippit for
> these
> > tests at the end of this email. When the code was being tested each
> > 'section' was individually uncommented and then the result of that test
> is
> > placed in a comment above the test.
> >
> > After doing these tests and scratching my head, I decided to do a test
> with
> > my old 82574L driver. I removed the X540 from the test machine and in the
> > same PCIe slot placed in the 82574L card which I have previously used
> (also
> > using the same ethernet connection). Since the descriptor format for
> legacy
> > descriptors is identical in the X540 and 82574L, I used the exact same
> code
> > (posted below) as I used on the X540. On this card *no fences* were
> needed.
> > Indicating that either A. I'm initialing the X540 in a manner that
> somehow
> > makes this behaviour possible. B. the X540 (or maybe my specific one)
> has a
> > bug that is causing a need force these fences. C. Maybe it's not a bug
> but
> > something that needs to be documented. I still find it very strange that
> a
> > write fence changes how things operate when no writes are actually being
> > done where I'm fencing.
> >
> > Some other tests I have done:
> >
> > - Have other processors spin and do mfences while the main core does not
> do
> > an mfence. This did not make it work, and this is expected as an mfence
> > only should locally change behaviour.
> > - mfences all over the X540 initialization and prior to doing the DD
> > polling. Did not fix the problem.
> >
> > Some things I could think of that would cause this problem:
> >
> > - It's just a bug in the X540, or mine specifically. (If I'm not too lazy
> > maybe I'll swap my X540s around between machines and try on another one).
> > - It's a bug in my initialization, but I would find this strange as my
> > 82574L driver initializes in almost an identical fashion.
> > - It's a bug in my motherboard/CPU, but only on >=8x channel PCIe cards,
> > which would explain why it didn't affect the 82574L.
> >
> > --------- Example code ---------------------
> >
> > ; Get the rx entry
> > mov rdx, qword [gs:thread_local.x540_rx_ring_base]
> > mov rax, qword [gs:thread_local.x540_rx_head]
> > shl rax, 4
> >
> > ; XXX: Temporary, used for the loop counter for testing.
> > xor ebp, ebp
> >
> > ; Putting an mfence/lfence/sfence here has no effect on the result.
> >
> > mov rsi, qword [rdx + rax + 0] ; pointer to packet contents
> >
> > ; Putting an mfence/lfence/sfence here has no effect on the result.
> >
> > .lewp:
> > ; Putting an mfence/lfence/sfence here has no effect on the result.
> >
> > ; Wait until a packet is present here by polling the DD bit
> > test dword [rdx + rax + 8 + 4], 1
> > jz short .lewp
> >
> > ; Without anything here, we fail with nothing getting printed.
> > ; Successful
> > ;mfence
> >
> > ; Failure, never prints
> > ;lfence
> >
> > ; Successful
> > ;sfence
> >
> > ; Successful (the pushes and pops do not change the result, fails without
> > ; rdtsc)
> > ;push rax
> > ;push rdx
> > ;rdtsc
> > ;pop rdx
> > ;pop rax
> >
> > ; Failure, prints out 3 to the screen. Meaning we read the value 3 times
> > ; before it became accurate. On a second attempt it printed 3 as well.
> > ;clflush [rsi + (udp_template_10g.ulen - udp_template_10g)]
> >
> > ; Successful
> > ;wbinvd
> >
> > ; Failure, never prints
> > ; mov rcx, 1024
> > ;.simple_pause:
> > ; dec rcx
> > ; jnz short .simple_pause
> >
> > ; Failure, never prints
> > ; mov rcx, 1024
> > ;.do_some_reads:
> > ; pop r15
> > ; dec rcx
> > ; jnz short .do_some_reads
> >
> > ; Failure, never prints
> > ; mov rcx, 1024
> > ;.do_some_writes:
> > ; push r15
> > ; dec rcx
> > ; jnz short .do_some_writes
> >
> > ; Successful
> > ;invlpg [rsi + (udp_template_10g.ulen - udp_template_10g)]
> > ; Failure, never prints
> > ;prefetch [rsi + (udp_template_10g.ulen - udp_template_10g)]
> >
> > ; Failure, never prints
> > ;lock inc qword [rsp]
> >
> > ; Spinloop and keep track of how many spins we have done in ebp. We spin
> > ; until the packet indicates a UDP length of 0x14 bytes.
> > .spin:
> > inc ebp
> > cmp word [rsi + (udp_template_10g.ulen - udp_template_10g)], 0x1400
> > jne short .spin
> >
> > ; Print out the value in ebp to the screen (count from the loop above).
> > mov edx, ebp
> > call outhexq
> >
> > cli
> > hlt
>
>
> --
> Antoine Kaufmann
> <antoi...@cs.washington.edu>
>
------------------------------------------------------------------------------
BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
Develop your own process in accordance with the BPMN 2 standard
Learn Process modeling best practices with Bonita BPM through live exercises
http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- event?utm_
source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel® Ethernet, visit
http://communities.intel.com/community/wired