Well, I've been playing around a bit more with the code. One interesting thing I've found (that I am sure Hans already knows) is that the DMA chip appears to increment the IVTV_REG_ENCDMAADDR address by 96 bits (the size of a scatter-gather array element) both on a failed and on a successful write. I'm not sure why it is doing this. 96 bits is the size of a scatter-gather array... but it would never make sense to increment the address by 96 bits. Why is it doing this? I don't know if this has anything to do with the problem... I just don't understand what the DMA chip is doing.

Also, I'd like to make a method to check to see how the buffers are being changed. However, I'm making some stupid mistake in the way I'm doing it. Perhaps someone could help. The kernel really doesn't like it when I do something like this:

((unsigned int*)SGarray[i].dst)[0]

where SGarray is the scatter-gather array. I'm trying to just print out the raw memory locations for the destination of the DMA to see if it is being changed by the failed writes. Does this have something to do with this area of memory being protected for DMA access? Below is the method I was trying to write.

static void print_SG_array(struct ivtv *itv, struct ivtv_SG_element *SGarray, int SG_length)
{
  int i;
  unsigned char* sizePtr = 0;
  int size = 0;
  IVTV_DEBUG_WARN("SGarray start: 0x%08x   length: %d\n", (unsigned int)SGarray, SG_length);
  for(i=0; i<SG_length; ++i)
  {
    sizePtr = ((unsigned char *)&SGarray[i].size);
    size = (sizePtr[1] * 256) + sizePtr[0];
    IVTV_DEBUG_WARN("Element: %d  src: 0x%08x  dst: 0x%08x sizeHex: 0x%08x intSize: %d\n", i, SGarray[i].src, SGarray[i].dst, SGarray[i].size, size);
    IVTV_DEBUG_WARN("SpotCheck offset 0: 0x%08x offset 1020: 0x%08x  offset 16380: 0x%08x offset 32763: 0x%08x\n", ((unsigned int*)SGarray[i].dst)[0], ((unsigned int*)SGarray[i].dst)[1020], ((unsigned int*)SGarray[i].dst)[16380], ((unsigned int*)SGarray[i].dst)[32763]);
  }



On 6/17/06, Kirk Lewis <[EMAIL PROTECTED] > wrote:
Thanks for the information Hans. I see some others are approaching this from a different angle as well. Hopefully they'll come up with something but I'd be more comfortable knowing what exactly is causing this and fixing it in the code. You certainly pointed out several areas of interest. I'll do my best to investigate them.


"1) The code that sets up the DMA arrays and registers is buggy. From the
tests done until now it looks like that part is OK, so this cause is
unlikely."

This seems unlikely to me as well. If nothing changes from the failed try to the retry (and it shouldn't), then one would think that would rule 1 out.


"2) Something goes wrong in the queue handling. This was the area where I
wanted to look into next. Did some partial data end up in a buffer? Was
some offset modified?"

I'll look at this next. It seems like something odd must be being written to the buffer. I also am going to see what happens if I simply don't try to redo the DMA.



"3) It's the firmware and it does indeed require extra handling in case
of a DMA error. Something of a default case if the first two do not pan
out."
I hope it isn't this either, but it seems likely it is a problem with the DMA error handling.




On 6/17/06, Hans Verkuil <[EMAIL PROTECTED] > wrote:
On Saturday 17 June 2006 03:46, Kirk Lewis wrote:
> I would like to help with Trac Ticket 49, as it is currently
> affecting me. In an earlier thread Hans said he was very busy, so I'd
> like to help in any way I can.

Great! I'm still working very hard on getting the driver into the kernel
and it looks that it will take still more time than expected.

> I'm not extremly experience with
> drivers, but I know enough to get around. Over the last few days I've
> been investigating the problem, and I haven't been able to solve it.
> So, if anyone has any suggestions, please let me know. Here is the
> information I have uncovered so far:
>
> I am running on an intel dual core processor system with a PVR 250
> (very old) and a new PVR 150. The error observed is:
>
> DMA Error 0x0000000b
>
> This is may, but does not always, cause corruption in the video
> stream. It may even cause the system to lock (although that could be
> a result of reading a badly corrupted video stream). I've found it to
> be very easily reproducable, but only by stressing the system to its
> limit. To reproduce this bug reliably I have to have 2 transcodings
> going on while recording on one tuner and watching TV on the other.
> It will occur in other cases... it just takes a lot longer.

Correct. As far as I could tell the DMA error itself is not a problem.
It can occur on a heavily loaded system, and the driver should simply
recover from it gracefully. How often it occurs is very hardware
dependent: some chipsets have better DMA handling than others, and the
conexant MPEG encoder chip is known to have a rather finicky (read
buggy) DMA engine.

It is the recovery from a DMA error where something goes wrong.

> The area of interest in the code is in  dma_from_device in
> ivtv-irq.c. This is where the error is being printed out from. The
> error means there was a write error. The write error is occuring
> exactly as one would expect (if there were to be an error). It occurs
> after the dma registers are written to instructing a write. It takes
> a bit for the card to set the DMA error, so the error doesn't occur
> immediatly, it occurs in the while loop which is normally waiting to
> see the DMA started (the DMAXFER bit to flip).
>
> Here are some things I have tried:
>
> -Uncommenting the DMA_locks placed around dma_from_device elswewhere
> in ivtv-irq.c. No effect.
>
> -Moving the DMA_slock spinlock (Line 640) up to include the loops
> checking for the appropriate time to start the write. I thought this
> could be a race condition... but it has no effect. I also did the
> same in ivtv-queue.c's dma_to_device
>
> -Checking to see if the IVTV_REG_ENCDMAADDR write isn't happening. I
> noticed there is a place where the code double checks to see if the
> register was actually written to. In my testing I never saw this
> register write fail. So... that's not it.
>
> -Doing another sanity check to make absolutely sure, at the point the
> DMA registers are written, that the registers are of the expected
> value. I never saw anything unusual.
>
> -Checking to see if there is some strange pause going on in the code.
> Even with all my printks I never saw more than 2 jiffies from the
> start of the method, to the return. This is in the error case (so it
> was re-trying the DMA write).
>
> -I never saw the DMA write fail more than once in a row. It always
> succeeded (or at least no error is set).
>
> -I'm not seeing any ivtv_sleep_timeouts fail.
>
> So... I'm running out of ideas. From all appearences, there is
> nothing different before a successful write and a failed write.
> Things look the same from the point of view of the registers. The
> retry always succeeds... even though everything is the same. Every
> now and then a write appears to randomly fail, and BAM things can get
> really screwed up.
>
> One question for others. Is it normal to see DMA writes fail? Or
> should it pretty much never happen? If the former is the case then
> I'll stop trying to figure out what is causing it to fail and focus
> on trying to find a way to recover from it.

As mentioned above, yes it is normal to see this.

> Is there any strange implict logic going on that I'm not seeing?
> Stuff like reading a register changing it's state (or some other
> register), or any of the itv/st data structures being changed? I'm
> getting really paranoid :*(
>
> Is there something different that must be done when a DMA error
> occurs? Is it not exceptable to just clear the DMA error bits and
> retry? Are there other bits that must be reset somewhere?
>
> Thanks for any advice anyone has.

You've pretty much done the same tests I did (and a few more) and I saw
the same things.

Now in my opinion there are three possible causes:

1) The code that sets up the DMA arrays and registers is buggy. From the
tests done until now it looks like that part is OK, so this cause is
unlikely.

2) Something goes wrong in the queue handling. This was the area where I
wanted to look into next. Did some partial data end up in a buffer? Was
some offset modified?

3) It's the firmware and it does indeed require extra handling in case
of a DMA error. Something of a default case if the first two do not pan
out.

I'm hoping it is 1 or 2. In that case it is a driver bug and after
fixing it everyone lives happily ever after. If it is 3, then there are
three options: first contact Chris Kennedy if he can help. He knows
more about it than anyone, even though he is no longer active with
driver development. Alternatively switch to using the mailbox command
to start the DMA. This was used in the past. For some reason it has
become linked to the pio setting so turning it on has other side
effects. Also, AFAIK the reason for abandoning that approach had to do
with bad behavior of that command when multiple streams are DMAing at
the same time (e.g. encoder, decoder, OSD).

The third option would be to see if it is possible to discover the
precise MPEG offset and see if we can compensate for it.

Good luck!

        Hans

_______________________________________________
ivtv-devel mailing list
[email protected]
http://ivtvdriver.org/mailman/listinfo/ivtv-devel


_______________________________________________
ivtv-devel mailing list
[email protected]
http://ivtvdriver.org/mailman/listinfo/ivtv-devel

Reply via email to