Hi Guys, Thanks for the replies. @Bill - I agree, it's unlikely that someone else would not have found such a bug given how widely used lwIP is. My first assumption is always that I've made an error somewhere:-) but there is no harm in asking the question while I search for the answer. I should have mentioned that I have LWIP_CHECKSUM_ON_COPY = 1 and CHECKSUM_CHECK_TCP = 1. So my code calls the #if TCP_CHECKSUM_ON_COPY code hence I don't call inet_chksum_pseudo(), see below for more.
@Simon - I'll apply the patch and re-test, but I can see from a debug run that that bit of code is not being executed in my implementation. If you saw my follow up email, you will notice that I identified the code that is causing my problem. It is caused by the line of code at line 1146 in tcp_out.c (Note I have LWIP_CHECKSUM_ON_COPY = 1) "acc += (u16_t)~(seg->chksum);" acc is a one's compliment checksum obtained from a call to inet_chksum_pseudo_partial() and seg->chksum is a checksum of the payload. What is happening is that occasionally during operation acc is resulting in a value of M and seg->chksum has, by coincidence, a value of M. Then M + (~M) always gives 0xFFFF. Why hasn't it been seen by others before? As I'm sure you are aware (I have just been reading up on it!) some checksum checkers might accept 0xFFFF as a valid checksum depending on how they validate the checksum (recalculate and compare to inserted checksum OR calculate with checksum value and check results is = 0). On windows 7 in my application it seems it re-calculates and compares the checksum and expects 0x0000 (wireshark does too!). This combination of lwip options and checksum validation method might explain why others may not have seen this error before now? Mathematically speaking using ones compliment maths, ~(sum(a+b+c+d)) is not the same as [(~sum(a+b)) + (~sum(c+d))] for the special corner case where sum(a+b) = ~sum(c+d). In this special case the answer will be 0xFFFF instead of 0x0000. Which is what is happening in my case! example (using 4 bit numbers for simplicity): let a = 1, b = 2, c = 4, d = 8. checksum = ~sum(a+b+c+d) = ~(0xF) = 0x0 sum(a+b) = 3 sum(b+c) = 0xC Calculated by code = [(~sum(a+b)) + (~sum(c+d))] = [~(3) + ~(0xC)] = [0xC + 3] = 0xF QED!? I'm more convinced that this is a coding issue in lwIP that doesn't handle this special corner case, but am happy to be proved wrong! Regards, Niall. On 14 May 2014 06:25, Simon Goldschmidt <[email protected]> wrote: > Bill Auerbach wrote: > > From an empirical standpoint, lwIP is used in far too many places for > there to be this significant of a bug. I’d look for a compiler bug or some > other issue. I seriously doubt it’s a bug in lwIP. Some of my company’s > users run our systems 24/7 sending lots of data through lwIP and I’d hear > about it really fast if there was this kind of a TCP lockup. > > > I'm flattered by your opinion but I fear this does not prevent lwIP from > having bugs :-) > > In this case, I think I fixed a bug in git master not too long ago > (#36153), here is the change, maybe it fixes things for you: > > @@ -658,6 +662,10 @@ tcp_write(struct tcp_pcb *pcb, const void *arg, u16_t > len, u8_t apiflags) > last_unsent->len += concat_p->tot_len; > #if TCP_CHECKSUM_ON_COPY > if (concat_chksummed) { > + /*if concat checksumm swapped - swap it back */ > + if (concat_chksum_swapped){ > + concat_chksum = SWAP_BYTES_IN_WORD(concat_chksum); > + } > tcp_seg_add_chksum(concat_chksum, concat_chksummed, &last_unsent->chksum, > &last_unsent->chksum_swapped); > last_unsent->flags |= TF_SEG_DATA_CHECKSUMMED; > > > Simon > > _______________________________________________ > lwip-users mailing list > [email protected] > https://lists.nongnu.org/mailman/listinfo/lwip-users >
_______________________________________________ lwip-users mailing list [email protected] https://lists.nongnu.org/mailman/listinfo/lwip-users
