Kirk Lewis wrote: > Thanks for the information Hans. I see some others are approaching > this from > a different angle as well. Hopefully they'll come up with something > but I'd > be more comfortable knowing what exactly is causing this and fixing it in > the code. You certainly pointed out several areas of interest. I'll do my > best to investigate them. > > "1) The code that sets up the DMA arrays and registers is buggy. From the > tests done until now it looks like that part is OK, so this cause is > unlikely." > > This seems unlikely to me as well. If nothing changes from the failed > try to > the retry (and it shouldn't), then one would think that would rule 1 out. > > "2) Something goes wrong in the queue handling. This was the area where I > wanted to look into next. Did some partial data end up in a buffer? Was > some offset modified?" > > I'll look at this next. It seems like something odd must be being > written to > the buffer. I also am going to see what happens if I simply don't try to > redo the DMA. > > > "3) It's the firmware and it does indeed require extra handling in case > of a DMA error. Something of a default case if the first two do not pan > out." > I hope it isn't this either, but it seems likely it is a problem with the > DMA error handling. >
Interesting, actually the problem with DMA errors at least from the decoder perspective is that it seems to involve the use of the pci wrap around that is done for the chip sdram. I don't fully understand it but basically it seems in the firmware they utilize this 'feature' and at the same time seem to end up writing randomly to pci memory which causes the random errors on some systems (some lock completely, depends on the hardware I guess). An engineer I knew from Conexant explained it that way to me, that the Linux kernel uses the pci and hardware in such a way that this behavior can totally cause havoc, and Windows does something different so it isn't seen there as often. The firmware folks seemed to do things like wrap around and still write to the decoder buffers internally, so they thought this was an easy way to do the wrap around, but it seems to randomly throw writes and reads into the general system memory somehow, and the read/write pointer in the firmware can get mixed up possibly. That's at least the information I got about that, besides the many Java processor bugs possibly helping that be less lucky at times, he said if you try and disable the wrap around everything breaks for decoding at least (maybe encoding too) since they wrote the firmware exploiting this wonderful 'feature'. (this is why the chip has 8 megs of memory instead of 16, they wrapped it I guess, at least from what I understand that's the way it works). Thanks, Chris > > > On 6/17/06, Hans Verkuil <[EMAIL PROTECTED]> wrote: >> >> On Saturday 17 June 2006 03:46, Kirk Lewis wrote: >> > I would like to help with Trac Ticket 49, as it is currently >> > affecting me. In an earlier thread Hans said he was very busy, so I'd >> > like to help in any way I can. >> >> Great! I'm still working very hard on getting the driver into the kernel >> and it looks that it will take still more time than expected. >> >> > I'm not extremly experience with >> > drivers, but I know enough to get around. Over the last few days I've >> > been investigating the problem, and I haven't been able to solve it. >> > So, if anyone has any suggestions, please let me know. Here is the >> > information I have uncovered so far: >> > >> > I am running on an intel dual core processor system with a PVR 250 >> > (very old) and a new PVR 150. The error observed is: >> > >> > DMA Error 0x0000000b >> > >> > This is may, but does not always, cause corruption in the video >> > stream. It may even cause the system to lock (although that could be >> > a result of reading a badly corrupted video stream). I've found it to >> > be very easily reproducable, but only by stressing the system to its >> > limit. To reproduce this bug reliably I have to have 2 transcodings >> > going on while recording on one tuner and watching TV on the other. >> > It will occur in other cases... it just takes a lot longer. >> >> Correct. As far as I could tell the DMA error itself is not a problem. >> It can occur on a heavily loaded system, and the driver should simply >> recover from it gracefully. How often it occurs is very hardware >> dependent: some chipsets have better DMA handling than others, and the >> conexant MPEG encoder chip is known to have a rather finicky (read >> buggy) DMA engine. >> >> It is the recovery from a DMA error where something goes wrong. >> >> > The area of interest in the code is in dma_from_device in >> > ivtv-irq.c. This is where the error is being printed out from. The >> > error means there was a write error. The write error is occuring >> > exactly as one would expect (if there were to be an error). It occurs >> > after the dma registers are written to instructing a write. It takes >> > a bit for the card to set the DMA error, so the error doesn't occur >> > immediatly, it occurs in the while loop which is normally waiting to >> > see the DMA started (the DMAXFER bit to flip). >> > >> > Here are some things I have tried: >> > >> > -Uncommenting the DMA_locks placed around dma_from_device elswewhere >> > in ivtv-irq.c. No effect. >> > >> > -Moving the DMA_slock spinlock (Line 640) up to include the loops >> > checking for the appropriate time to start the write. I thought this >> > could be a race condition... but it has no effect. I also did the >> > same in ivtv-queue.c's dma_to_device >> > >> > -Checking to see if the IVTV_REG_ENCDMAADDR write isn't happening. I >> > noticed there is a place where the code double checks to see if the >> > register was actually written to. In my testing I never saw this >> > register write fail. So... that's not it. >> > >> > -Doing another sanity check to make absolutely sure, at the point the >> > DMA registers are written, that the registers are of the expected >> > value. I never saw anything unusual. >> > >> > -Checking to see if there is some strange pause going on in the code. >> > Even with all my printks I never saw more than 2 jiffies from the >> > start of the method, to the return. This is in the error case (so it >> > was re-trying the DMA write). >> > >> > -I never saw the DMA write fail more than once in a row. It always >> > succeeded (or at least no error is set). >> > >> > -I'm not seeing any ivtv_sleep_timeouts fail. >> > >> > So... I'm running out of ideas. From all appearences, there is >> > nothing different before a successful write and a failed write. >> > Things look the same from the point of view of the registers. The >> > retry always succeeds... even though everything is the same. Every >> > now and then a write appears to randomly fail, and BAM things can get >> > really screwed up. >> > >> > One question for others. Is it normal to see DMA writes fail? Or >> > should it pretty much never happen? If the former is the case then >> > I'll stop trying to figure out what is causing it to fail and focus >> > on trying to find a way to recover from it. >> >> As mentioned above, yes it is normal to see this. >> >> > Is there any strange implict logic going on that I'm not seeing? >> > Stuff like reading a register changing it's state (or some other >> > register), or any of the itv/st data structures being changed? I'm >> > getting really paranoid :*( >> > >> > Is there something different that must be done when a DMA error >> > occurs? Is it not exceptable to just clear the DMA error bits and >> > retry? Are there other bits that must be reset somewhere? >> > >> > Thanks for any advice anyone has. >> >> You've pretty much done the same tests I did (and a few more) and I saw >> the same things. >> >> Now in my opinion there are three possible causes: >> >> 1) The code that sets up the DMA arrays and registers is buggy. From the >> tests done until now it looks like that part is OK, so this cause is >> unlikely. >> >> 2) Something goes wrong in the queue handling. This was the area where I >> wanted to look into next. Did some partial data end up in a buffer? Was >> some offset modified? >> >> 3) It's the firmware and it does indeed require extra handling in case >> of a DMA error. Something of a default case if the first two do not pan >> out. >> >> I'm hoping it is 1 or 2. In that case it is a driver bug and after >> fixing it everyone lives happily ever after. If it is 3, then there are >> three options: first contact Chris Kennedy if he can help. He knows >> more about it than anyone, even though he is no longer active with >> driver development. Alternatively switch to using the mailbox command >> to start the DMA. This was used in the past. For some reason it has >> become linked to the pio setting so turning it on has other side >> effects. Also, AFAIK the reason for abandoning that approach had to do >> with bad behavior of that command when multiple streams are DMAing at >> the same time (e.g. encoder, decoder, OSD). >> >> The third option would be to see if it is possible to discover the >> precise MPEG offset and see if we can compensate for it. >> >> Good luck! >> >> Hans >> >> _______________________________________________ >> ivtv-devel mailing list >> [email protected] >> http://ivtvdriver.org/mailman/listinfo/ivtv-devel >> > > ------------------------------------------------------------------------ > > _______________________________________________ > ivtv-devel mailing list > [email protected] > http://ivtvdriver.org/mailman/listinfo/ivtv-devel _______________________________________________ ivtv-devel mailing list [email protected] http://ivtvdriver.org/mailman/listinfo/ivtv-devel
