Hi,
I reproduced the problem with Jumbo packets and I have very good clues what
may be happening. But not sure of the reason though. Please tell me if my
analysis is correct.
I have 16 Rx rings. Each ring has 2048 descriptors. Each descriptor can
accomodate a packet buffer of 2048 bytes.

When a particular Rx Queue starts dropping packets, I see that

        "qprdc" ="Queue Packet Receive Drop Count" = 422682
        "qrdh" ="Queue Receive Descriptor Head" = 760
        "qrdt" ="Queue Receive Descriptor Tail" = 760
        "qrdntc" ="Queue Receive Descriptor Next to Check" = 762
        "qrdlc" ="Queue Receive Descriptor Last Cleaned" = 761
        "qtda" ="Queue Transmit Descriptor Available" = 274

When I printed the Rx ring descriptor and ixgbe_adv_rx_desc information, I
see that

===================================================
Descriptor 762 status 0
read.pkt_addr 18facc800
read.hdr_addr 0
wb.lower.lo_dword.hs_rss.pkt_info 51200
wb.lower.lo_dword.hs_rss.hdr_info 36780
wb.lower.hi_dword.rss 1
wb.lower.hi_dword.csum_ip.ip_id 1
wb.lower.hi_dword.csum_ip.csum 0
wb.upper.status_error 0
wb.upper.length 0
wb.upper.vlan 0

==================================================
Descriptor 757 status -2147483583
read.pkt_addr a758a75801c00012
read.hdr_addr 80080000041
wb.lower.lo_dword.hs_rss.pkt_info 18
wb.lower.lo_dword.hs_rss.hdr_info 448
wb.lower.hi_dword.rss a758a758
wb.lower.hi_dword.csum_ip.ip_id 42840
wb.lower.hi_dword.csum_ip.csum a758
wb.upper.status_error 2147483713
wb.upper.length 2048
wb.upper.vlan 0
==================================================
Descriptor 758 status -2147483583
read.pkt_addr a758a75801c00012
read.hdr_addr 80080000041
wb.lower.lo_dword.hs_rss.pkt_info 18
wb.lower.lo_dword.hs_rss.hdr_info 448
wb.lower.hi_dword.rss a758a758
wb.lower.hi_dword.csum_ip.ip_id 42840
wb.lower.hi_dword.csum_ip.csum a758
wb.upper.status_error 2147483713
wb.upper.length 2048
wb.upper.vlan 0
==================================================
Descriptor 759 status -2147483581
read.pkt_addr a758a75801c00012
read.hdr_addr 3c080000043
wb.lower.lo_dword.hs_rss.pkt_info 18
wb.lower.lo_dword.hs_rss.hdr_info 448
wb.lower.hi_dword.rss a758a758
wb.lower.hi_dword.csum_ip.ip_id 42840
wb.lower.hi_dword.csum_ip.csum a758
wb.upper.status_error 2147483715
wb.upper.length 960
wb.upper.vlan 0
==================================================
Descriptor 760 status 0
read.pkt_addr 189449000
read.hdr_addr 0
wb.lower.lo_dword.hs_rss.pkt_info 36864
wb.lower.lo_dword.hs_rss.hdr_info 35140
wb.lower.hi_dword.rss 1
wb.lower.hi_dword.csum_ip.ip_id 1
wb.lower.hi_dword.csum_ip.csum 0
wb.upper.status_error 0
wb.upper.length 0
wb.upper.vlan 0
==================================================
Descriptor 761 status 0
read.pkt_addr 18947f000
read.hdr_addr 0
wb.lower.lo_dword.hs_rss.pkt_info 61440
wb.lower.lo_dword.hs_rss.hdr_info 35143
wb.lower.hi_dword.rss 1
wb.lower.hi_dword.csum_ip.ip_id 1
wb.lower.hi_dword.csum_ip.csum 0
wb.upper.status_error 0
wb.upper.length 0
wb.upper.vlan 0
==================================================

If I look at the Descriptors 757 - 761, these 5 buffers are allocated for a
Jumbo packet. In this case the Jumbo packet should have only occupied 3/5
buffers. This can be confirmed from the status_error of these descriptors.

Descriptor 757 - status_error = 2147483713 = 0x80000041
= IXGBE_RXDADV_ERR_IPE | IXGBE_RXD_STAT_IPCS | IXGBE_RXD_STAT_DD
Descriptor 758 - status_error = 2147483713 = 0x80000041
=  IXGBE_RXDADV_ERR_IPE | IXGBE_RXD_STAT_IPCS | IXGBE_RXD_STAT_DD
Descriptor 759 - status_error = 2147483715 = 0x80000043
=  IXGBE_RXDADV_ERR_IPE | IXGBE_RXD_STAT_IPCS | IXGBE_RXD_STAT_EOP
| IXGBE_RXD_STAT_DD
Descriptor 760 - status_error = 0
Descriptor 761 - status_error = 0

#define IXGBE_RXDADV_ERR_IPE    0x80000000 /* IP Checksum Error */
#define IXGBE_RXDADV_ERR_FCEOFE         0x80000000 /* FCoEFe/IPE */
#define IXGBE_RXD_STAT_IPCS     0x40    /* IP xsum calculated */
#define IXGBE_RXD_STAT_DD       0x01    /* Descriptor Done */
#define IXGBE_RXD_STAT_EOP      0x02    /* End of Packet */


Ideally, when this Jumbo packet is received, next_to_check should
point to *Descriptor
760* because IXGBE_RXD_STAT_EOP is seen at Descriptor 759.
I think what happened in function ixgbe_rxeof_locked,

RDH -> Correctly points at 760
next_to_check - Application read all 5 descriptors and made it 762
last_cleaned - Application read all 5 descriptors and made it 761
RDT - Updated by the application as 760 as it is updated only when if
(rxr->last_cleaned % 8 == 0)

At the end of this function, since RDH == RDT, the hardware will think that
the Queue is Full. Application will see that there is no packet at
Descriptor next_to_check 762. Its a stalemate.
The statement *if (staterr & IXGBE_RXD_STAT_EOP)* is not executed when *if
(!(staterr & IXGBE_RXDADV_ERR_FRAME_ERR_MASK))* return false. But this
doesn't look like the case.

#define IXGBE_RXDADV_ERR_FRAME_ERR_MASK ( \
                                      IXGBE_RXDADV_ERR_CE | \
                                      IXGBE_RXDADV_ERR_LE | \
                                      IXGBE_RXDADV_ERR_PE | \
                                      IXGBE_RXDADV_ERR_OSE | \
                                      IXGBE_RXDADV_ERR_USE)




Regards,
Kaushal

On Thu, Jul 4, 2013 at 10:08 PM, Kaushal Bhandankar <[email protected]>wrote:

> Hi all,
> I have dumped the Rx ring descriptor status when this happens. This is
> what I found
>
>         "qprdc" ="Queue Packet Receive Drop Count">1602602</token>
>         "qrdh" ="Queue Receive Descriptor Head">1256</token>
>         qrdt" ="Queue Receive Descriptor Tail">1256</token>
>         qrdntc" ="Queue Receive Descriptor Next to Check">1257</token>
>         qrdlc" ="Queue Receive Descriptor Last Cleaned">1256</token>
>
> I checked the status of the Rx Ring descriptos from 1257 onwards.
>
> Descriptor 1257 status 0
> Descriptor 1258 status 0
> Descriptor 1259 status 0
>
> ...
> ...
> ...
>
>
> Descriptor 1247 status 0
> Descriptor 1248 status 0
> Descriptor 1249 status -1073741725
> Descriptor 1250 status -1073741725
> Descriptor 1251 status -1073741725
> Descriptor 1252 status -1073741725
> Descriptor 1253 status -1073741725
> Descriptor 1254 status -1073741725
> Descriptor 1255 status -1073741725
> Descriptor 1256 status 0
>
> This means that the Application sees that the Rx_ring is empty, but
> actualy Rx_ring is full and is in a bad state. Any idea what may be
> happening ?
>
>
> Regards,
> Kaushal
>
>
>
>
>
> On Tue, Jun 18, 2013 at 11:48 PM, Skidmore, Donald C <
> [email protected]> wrote:
>
>>  Hey Kaushal,****
>>
>> ** **
>>
>> So can I assume you are using DPDK for your driver?  I ask because of the
>> tap_create call you mentioned.  If so I’ve only had limited exposure to
>> that driver as a different group does its development.  If that is the
>> cause I can see if I can find you someone in that group to talk with.****
>>
>> ** **
>>
>> Thanks,****
>>
>> -Don Skidmore <[email protected]>****
>>
>> ** **
>>
>> *From:* Kaushal Bhandankar [mailto:[email protected]]
>> *Sent:* Monday, June 17, 2013 7:32 PM
>> *To:* Skidmore, Donald C
>> *Cc:* [email protected]
>>
>> *Subject:* Re: [E1000-devel] Intel 82599 Driver dropping packets Queue
>> Receive Descriptor Head****
>>
>>  ** **
>>
>> Thanks for the reply. I ll elaborate the issue we are seeing.****
>>
>> When the affected queue started dropping packets, it had received 50% of
>> traffic comapred to any other queue which was working fine. After this the
>> traffic was diverted to another box. With no traffic in the Box, the
>> affected Queue drops 100 % packets it sees. We do not use interrupt
>> mechanism in our case. The Niantic hardware nic calculates the hash and
>> puts the packet in the appropriate queue. The user Application directly
>> read from this queue. There is no driver running in Kernel space. This is
>> done to improve performance of the system. Also, We area creating a tap
>> interface using ixgbe_tap_create call. ****
>>
>> ** **
>>
>> =============****
>>
>> ** **
>>
>>
>>              ****
>>
>>          ****
>>
>> ** **
>>
>> CLMBOH0001-IPS903# exit****
>>
>> ** **
>>
>> On Tue, Jun 18, 2013 at 2:39 AM, Skidmore, Donald C <
>> [email protected]> wrote:****
>>
>> Hi Kaushal,
>>
>> Sorry about the troubles you're having with ixgbe.  I was a bit confused
>> by what exactly you're seeing so let me ask you a few questions.  When you
>> say " I see that out of 16 Rx Queues, only *One Queue* is dropping all the
>> packets." is that queue getting the majority of the traffic or the same
>> load as the other queues?  I would be interesting to see how your interrupt
>> were laid out (cat /proc/interrupt | grep <ethX>)
>>
>> As far as why MPC incrementing this can happen due to two reasons.  First
>> we ran out of space in descriptor ring for the packet, this doesn't seem to
>> be your case as you checked to make sure the ring was empty.  Next we could
>> be running on of bandwidth on the PCIe bus.  This might be possible if all
>> your traffic was going to just one queue, but it would be worthwhile to see
>> what the connectivity looks like anyway (lspci -vvv)
>>
>> It is also worth noting that your driver is quite old 2.0.38.3.  I would
>> suggest at least trying the latest driver from source forge (currently
>> 3.15.1).
>> https://sourceforge.net/projects/e1000/files/ixgbe%20stable/
>>
>> I would also like to see if anything is being logged while this occurs
>> (dmesg).
>>
>> Thanks,
>> -Don Skidmore <[email protected]>****
>>
>>
>> > -----Original Message-----
>> > From: Kaushal Bhandankar [mailto:[email protected]]
>> > Sent: Sunday, June 16, 2013 12:32 PM
>> > To: [email protected]
>> > Subject: Re: [E1000-devel] Intel 82599 Driver dropping packets Queue
>> > Receive Descriptor Head
>> >
>> > -bash-3.2$ /home/service/ethtool_x64 -i po0_0
>> > driver: tun
>> > version: 1.6
>> > firmware-version: N/A
>> > bus-info: tap
>> > supports-statistics: no
>> > supports-test: no
>> > supports-eeprom-access: no
>> > supports-register-dump: no
>> >
>> >
>> > -bash-3.2# modinfo ixgbe
>> > filename:       /lib/modules/2.6.29.1/kernel/drivers/net/ixgbe/ixgbe.ko
>> > version:        2.0.38.3-NAPI
>> > license:        GPL
>> > description:    Intel(R) 10 Gigabit PCI Express Network Driver
>> > author:         Intel Corporation, <[email protected]>
>> > srcversion:     CF158600678C3ADC41F341A
>> > alias:          pci:v00008086d000010FBsv*sd*bc*sc*i*
>> > alias:          pci:v00008086d000010FCsv*sd*bc*sc*i*
>> > alias:          pci:v00008086d000010F7sv*sd*bc*sc*i*
>> > alias:          pci:v00008086d000010DBsv*sd*bc*sc*i*
>> > alias:          pci:v00008086d000010F4sv*sd*bc*sc*i*
>> > alias:          pci:v00008086d000010E1sv*sd*bc*sc*i*
>> > alias:          pci:v00008086d000010F1sv*sd*bc*sc*i*
>> > alias:          pci:v00008086d000010ECsv*sd*bc*sc*i*
>> > alias:          pci:v00008086d000010DDsv*sd*bc*sc*i*
>> > alias:          pci:v00008086d0000150Bsv*sd*bc*sc*i*
>> > alias:          pci:v00008086d000010C8sv*sd*bc*sc*i*
>> > alias:          pci:v00008086d000010C7sv*sd*bc*sc*i*
>> > alias:          pci:v00008086d000010C6sv*sd*bc*sc*i*
>> > alias:          pci:v00008086d00001508sv*sd*bc*sc*i*
>> > alias:          pci:v00008086d000010B6sv*sd*bc*sc*i*
>> > depends:
>> > vermagic:       2.6.29.1 SMP mod_unload
>> > parm:           InterruptType:Change Interrupt Mode (0=Legacy, 1=MSI,
>> > 2=MSI-X), default 2 (array of int)
>> > parm:           MQ:Disable or enable Multiple Queues, default 1 (array
>> of
>> > int)
>> > parm:           DCA:Disable or enable Direct Cache Access, 0=disabled,
>> > 1=descriptor only, 2=descriptor and data (array of int)
>> > parm:           RSS:Number of Receive-Side Scaling Descriptor Queues,
>> > default 1=number of cpus (array of int)
>> > parm:           InterruptThrottleRate:Maximum interrupts per second, per
>> > vector, (956-488281), default 8000 (array of int)
>> > parm:           LLIPort:Low Latency Interrupt TCP Port (0-65535) (array
>> of
>> > int)
>> > parm:           LLIPush:Low Latency Interrupt on TCP Push flag (0,1)
>> (array
>> > of int)
>> > parm:           LLISize:Low Latency Interrupt on Packet Size (0-1500)
>> > (array of int)
>> > parm:           LLIEType:Low Latency Interrupt Ethernet Protocol Type
>> > (array of int)
>> > parm:           LLIVLANP:Low Latency Interrupt on VLAN priority
>> threshold
>> > (array of int)
>> > parm:           RxBufferMode:0=1 descriptor per packet,
>> >                         1=use packet split, multiple descriptors per
>> jumbo frame
>> >                         2 (default)=use 1buf mode for 1500 mtu, packet
>> split for jumbo
>> > (array of int)
>> > parm:           FdirMode:Flow Director filtering modes:
>> >                         0 = Filtering off
>> >                         1 = Signature Hashing filters (SW ATR)
>> >                         2 = Perfect Filters (array of int)
>> > parm:           FdirPballoc:Flow Director packet buffer allocation
>> level:
>> >                         0 = 8k hash filters or 2k perfect filters
>> >                         1 = 16k hash filters or 4k perfect filters
>> >                         2 = 32k hash filters or 8k perfect filters
>> (array of int)
>> > parm:           AtrSampleRate:Software ATR Tx packet sample rate (array
>> of
>> > int)
>> > parm:           DoubleVlan:Disable or enable double Vlan support,
>> default 0
>> > (array of int)
>> > parm:           InnerVlanMode:Disable or enable Inner Vlan stripping,
>> > default 0 (array of int)
>> >
>> >
>> >
>> > -bash-3.2# modprobe -l ixgbe
>> > /lib/modules/2.6.29.1/kernel/drivers/net/ixgbe/ixgbe.ko
>> >
>> > -bash-3.2# lsmod
>> > Module                  Size  Used by
>> > cidmodcap               4208  16
>> > cpp_base              845040  8
>> > tipc                  118392  2
>> > rebootkom               2468  0
>> > nf_conntrack_ipv4      13376  1
>> > nf_defrag_ipv4          1976  1 nf_conntrack_ipv4
>> > xt_state                2232  1
>> > nf_conntrack           60000  2 nf_conntrack_ipv4,xt_state
>> > iptable_filter          2888  1
>> > ip_tables              15848  1 iptable_filter
>> > x_tables               18208  2 xt_state,ip_tables
>> > igb                    76876  0
>> > e1000e                113256  0
>> > ixgbe                 168116  0
>> > ihm                     6252  2
>> > cids_shared           579704  0
>> > linux_user_bde         15624  0
>> > linux_kernel_bde       28816  1 linux_user_bde
>> > -bash-3.2#
>> >
>> >
>> > cat /proc/cpuinfo = 16 cores
>> >
>> >
>> >
>> >
>> > On Sun, Jun 16, 2013 at 10:04 AM, Kaushal Bhandankar
>> > <[email protected]>wrote:
>> >
>> > > Hi,
>> > > I am using Intel 82599 driver in my product. I see that out of 16 Rx*
>> ***
>>
>> > > Queues, only *One Queue* is dropping all the packets. When I did a
>> > > test *to en-queue packets only to the problematic Queue*, I found that
>> > > the
>> > >
>> > > -> Rx Missed Packet Count ( mpc ) is 0 Good Packet Received Count (
>> > > -> gprc ) in increased for received packets -****
>>
>> > > L2 filtering is pass.
>> > >
>> > > What may be the reason for this behavior ?
>> > >
>> > > For debug, I am also printing the "Queue Receive Descriptor Head" and
>> > > "Queue Receive Descriptor Tail" to get information about how many
>> > > descriptors are in-use. My question is, in further tests, if I find
>> > > that No Descriptor is in-use, however still packets are getting
>> > > dropped -- Can I infer Hardware Failure from it ?
>> > >
>> > > Regards,
>> > > Kaushal
>> > >****
>>
>> ** **
>>
>
>
------------------------------------------------------------------------------
This SF.net email is sponsored by Windows:

Build for Windows Store.

http://p.sf.net/sfu/windows-dev2dev
_______________________________________________
E1000-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit 
http://communities.intel.com/community/wired

Reply via email to