> From: Kyle Larose [mailto:eomereadig at gmail.com] 
> Sent: Wednesday, September 9, 2015 6:43 PM
> To: Tahhan, Maryam
> Cc: Olivier MATZ; Andriy Berestovskyy; dev at dpdk.org
> Subject: Re: [dpdk-dev] ixgbe: account more Rx errors Issue
>
>
> On Mon, Sep 7, 2015 at 7:44 AM, Tahhan, Maryam <maryam.tahhan at intel.com> 
> wrote:
> > From: Olivier MATZ [mailto:olivier.matz at 6wind.com]
> > Sent: Monday, September 7, 2015 9:30 AM
> > To: Tahhan, Maryam; Andriy Berestovskyy
> > Cc: dev at dpdk.org
> > Subject: Re: ixgbe: account more Rx errors Issue
> >
> > Hi,
> >
> > On 09/06/2015 07:15 PM, Tahhan, Maryam wrote:
> > >> From: Andriy Berestovskyy [mailto:aber at semihalf.com]
> > >> Sent: Friday, September 4, 2015 5:59 PM
> > >> To: Tahhan, Maryam
> > >> Cc: dev at dpdk.org; Olivier MATZ
> > >> Subject: Re: ixgbe: account more Rx errors Issue
> > >>
> > >> Hi Maryam,
> > >> Please see below.
> > >>
> > >>> XEC counts the Number of receive IPv4, TCP, UDP or SCTP XSUM errors
> > >>
> > >> Please note than UDP checksum is optional for IPv4, but UDP packets
> > >> with zero checksum hit XEC.
> > >>
> > >
> > > I understand, but this is what the hardware register is picking up and 
> > > what I
> > included previously is the definitions of the registers from the datasheet.
> > >
> > >>> And general crc errors counts Counts the number of receive packets
> > >>> with
> > >> CRC errors.
> > >>
> > >> Let me explain you with an example.
> > >>
> > >> DPDK 2.0 behavior:
> > >> host A sends 10M IPv4 UDP packets (no checksum) to host B host B
> > >> stats: 9M ipackets + 1M ierrors (missed) = 10M
> > >>
> > >> DPDK 2.1 behavior:
> > >> host A sends 10M IPv4 UDP packets (no checksum) to host B host B
> > >> stats: 9M ipackets + 11M in ierrors (1M missed + 10M XEC) = 20M?
> > >
> > > Because it's hitting the 2 error registers. If you had packets with 
> > > multiple
> > errors that are added up as part of ierrors you'll still be getting more 
> > than
> > 10M errors which is why I asked for feedback on the 3 suggestions below.
> > What I'm saying is the number of errors being > the number of received
> > packets will be seen if you hit multiple error registers on the NIC.
> > >
> > >>
> > >>> So our options are we can:
> > >>> 1. Add only one of these into the error stats.
> > >>> 2. We can introduce some cooking of stats in this scenario, so only
> > >>> add
> > >> either or if they are equal or one is higher than the other.
> > >>> 3. Add them all which means you can have more errors than the number
> > >>> of
> > >> received packets, but TBH this is going to be the case if your
> > >> packets have multiple errors anyway.
> > >>
> > >> 4. ierrors should reflect NIC drops only.
> > >
> > > I may have misinterpreted this, but ierrors in rte_ethdev.h ierrors is 
> > > defined
> > as the Total number of erroneous received packets.
> > > Maybe we need a clear definition or a separate drop counter as I see
> > uint64_t q_errors defined as: Total number of queue packets received that
> > are dropped.
> > >
> > >> XEC does not count drops, so IMO it should be removed from ierrors.
> > >
> > > While it's picking up the 0 checksum as an error (which it shouldn't
> > > necessarily be doing), removing it could mean missing other valid
> > > L3/L4 checksum errors... Let me experiment some more with L3/L4
> > > checksum errors and crcerrs to see if we can cook the stats around
> > > this register in particular. I would hate to remove it and miss
> > > genuine errors
> >
> > For me, the definition that looks the most straightforward is:
> >
>>  ipackets = packets successfully received by hardware imissed = packets
> > dropped by hardware because the software does
> >? ?not poll fast enough (= queue full)
> > ierrors = packets dropped by hardware (malformed packets, ...)
> >
> > These 3 stats never count twice the same packet.
> >
> > If we want more statistics, they could go in xstats. For instance, a 
> > counter for
> > invalid checksum. The definition of these stats would be pmd-specific.
> >
> > I agree we should clarify and have a consensus on the definitions before 
> > going
> > further.
> >
> >
> > Regards,
> > Olivier
> Hi Olivier
> I think it's important to distinguish between errors and drops and provide a 
> statistics API that exposes both. This way people have access to as much 
> information as possible when things do go wrong and nothing is missed in 
> terms of errors.
>
> My suggestion for the high level registers would be:
> ipackets = Total number of packets successfully received by hardware
> imissed = Total number of? packets dropped by hardware because the software 
> does not poll fast enough (= queue full)
> idrops = Total number of packets dropped by hardware (malformed packets, ...) 
> Where the # of drops can ONLY be <=? the packets received (without overlap 
> between registers).
> ierrors = Total number of erroneous received packets. Where the # of errors 
> can be >= the packets received (without overlap between registers), this is 
> because there may be multiple errors associated with a packet.
>
> This way people can see how many packets were dropped and why at a high level 
> as well as through the extended stats API rather than using one API or the 
> other. What do you think?
>
> Best Regards
> Maryam
> >
> >
> >
> >
> > >>
> > >> Please note that we still can access the XEC using
> > >> rte_eth_xstats_get()
> > >>
> > >>
> > >> Regards,
> > >> Andriy
>
> Hi Maryam,
>
> If we look to the if-mib (from http://www.ietf.org/rfc/rfc2233.txt), we can 
> see that their definition of in errors aligns more closely with Olivier's.
>
> There they say (>>> <<< mine):
>
>? ?ifInErrors OBJECT-TYPE
>? ? ? ?SYNTAX ? ? ?Counter32
>? ? ? ?MAX-ACCESS ?read-only
>? ? ? ?STATUS ? ? ?current
>? ? ? ?DESCRIPTION
>? ? ? ? ? ? ? ?"For packet-oriented interfaces, >>> the number of inbound
>? ? ? ? ? ? ? ?packets that contained errors preventing them from
>? ? ? ? ? ? ? ?being deliverable to a higher-layer protocol <<<.? For
>? ? ? ? ? ? ? ?character-oriented or fixed-length interfaces, the
>? ? ? ? ? ? ? ?number of inbound transmission units that contained
>? ? ? ? ? ? ? ?errors preventing them from being deliverable to a
>? ? ? ? ? ? ? ?higher-layer protocol.
>
>? ? ? ? ? ? ? ?Discontinuities in the value of this counter can occur
>? ? ? ? ? ? ? ?at re-initialization of the management system, and at
>? ? ? ? ? ? ? ?other times as indicated by the value of
>? ? ? ? ? ? ? ?ifCounterDiscontinuityTime."
>? ? ? ?::= { ifEntry 14 }
>
> They count it as the number of packets, not the number of errors. So, if a 
> packet contains two errors, it is only counted once.
>
> I'm not sure what the intention of the ierrors stat is. Do we intend to use 
> it to feed into MIBs/standards such as the above? Or do we intend to make it 
> something different? If the former, I think we should conform to the meaning 
> suggested by rfc2233.
> Thanks,
>
> Kyle

Hi Kyle

Ok, I can now see that we were approaching error stats from different levels, 
in that I was considering things more from a packet level than an interface 
level. I'm quite happy with the definitions Olivier provided for an interface 
level and agree that this is better from a backwards compatibility perspective 
with existing drivers. I still see a need though for exposing errors at a 
packet level, as such I would propose the following:


ipackets = Total number of packets successfully received by hardware
imissed = Total number of  packets dropped by hardware because the software 
does not poll fast enough (= queue full)
ierrors = Total number of packets dropped by hardware (malformed packets, ...) 
Where the # of drops can ONLY be <=  the packets received (without overlap 
between registers).
Rx_pkt_errors = Total number of erroneous received packets. Where the # of 
errors can be >= the packets received (without overlap between registers), this 
is because there may be multiple errors associated with a packet.

The reason why I think this is important is for fault management of DPDK 
Interfaces from a higher level fault management entity. ATM I'm developing a 
collectd plugin for DPDK statistics with the fault management use-case in mind. 
With that it would be of a great advantage to expose error statistics through 
the generic statistics API as well as through the extended stats API, to ensure 
that no error is missed. 

In addition to this, if I look at the various interface datasheet I see a 
distinction being made between error and drop registers, in that they have 
both. Finally if we look at ifconfig, they make a distinction at a high level 
between drops and errors. 

$ifconfig -a
eth0      Link encap:Ethernet  HWaddr 
          inet addr:  Bcast:  Mask:
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:113307576 errors:0 dropped:0 overruns:0 frame:0
          TX packets:125554856 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:56712860715 (54085.5 Mb)  TX bytes:78332692918 (74703.8 Mb)

lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:114677128 errors:0 dropped:0 overruns:0 frame:0
          TX packets:114677128 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:87742098789 (83677.3 Mb)  TX bytes:87742098789 (83677.3 Mb)


For me, what it really comes down to is making the interface as intuitive as 
possible for a higher level entity to monitor a DPDK interface without solely 
relying on the extended NIC interface, which if we change the definition of 
ierrors to include dropped packets only, without exposing the erroneous packets 
counter will not include erroneous packet counters that don't result drops and 
as such, we could have missed errors on the NIC.

Is the proposed solution amiable to all parties? I'm happy to provide more 
details about the DPDK collectd plugin and the development effort there if 
anyone is interested.

All the best
Maryam


Reply via email to