FWIW, I'm planning on distributing the sources to at least the skeletons in future version of Codec Engine. You can then see the cache operations in all their glory. Not sure which version of CE you have, but here is a snippet of the SPHENC skeleton (DSP-side). In red, I've highlighted the cacheWbInv call(s) _right after_ it invokes the codec's process() call. This is from Codec Engine 1.02: /* invalidate cache for all input and output buffers */ for (i = 0; i < inBufs.numBufs; i++) { Memory_cacheInv(inBufs.bufs[i], inBufs.bufSizes[i]); } for (i = 0; i < outBufs.numBufs; i++) { Memory_cacheInv(outBufs.bufs[i], outBufs.bufSizes[i]); } /* unmarshall outArgs based on the "size" of inArgs */ pOutArgs = (ISPHENC_OutArgs *)((UInt)(&(msg->cmd.process.inArgs)) + msg->cmd.process.inArgs.size); /* make the process call */ msg->visa.status = SPHENC_process(handle, &inBufs, &outBufs, &(msg->cmd.process.inArgs), pOutArgs); /* flush cache for all output buffers */ for (i = 0; i < outBufs.numBufs; i++) { Memory_cacheWbInv(outBufs.bufs[i], outBufs.bufSizes[i]); } So, yes, the Codec Engine framework should be managing your outBufs' cache for you. Perhaps you can double-check that your outBufs.numBufs and outBufs.bufSizes[] is correct? outBufs.bufSizes[] should be the size of the _buffer_ not necessarily the size of the _contents_ of the buffer. And the codec shouldn't be changing either of these values - they're read-only from the codec's perspective. It might be interesting to turn on trace as Scott describes below - it should show exactly what cache operations are going on. It'll be dumped into DSP-side memory, and can be 'scooped out' and displayed by the ARM using either TraceUtils (preferred) or, if your app isn't using TraceUtils, Engine_fwriteTrace(). Hope that helps. Chris
________________________________ From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Andy Ngo Sent: Friday, March 02, 2007 2:45 AM To: Gary, Scott; Adam Dawidziuk Cc: davinci-linux-open-source @linux.davincidsp.com Subject: Re: Cache coherency issue? Nope, that didn't help. I tried allocating 1024 bytes for the output buffer (I only needed 105 bytes as stated in my previous email) and I still have the stale data problem unless I explicitly do a writeback invalidate. So that eliminates the "integral number of cache lines" requirement as the culprit for my cache problem. I used Memory_contigAlloc so that should take care of the contiguous requirement, how do you ensure the cache-line alignment requirement? My DSP codec is not writing to the IN buffers and the OUT buffers are not written to by my ARM application so that eliminates Scott's points 2 and 3 as the culprit. Regards, Andy ----- Original Message ---- From: Andy Ngo <[EMAIL PROTECTED]> To: "Gary, Scott" <[EMAIL PROTECTED]>; Adam Dawidziuk <[EMAIL PROTECTED]> Cc: "davinci-linux-open-source @linux.davincidsp.com" <davinci-linux-open-source@linux.davincidsp.com> Sent: Thursday, March 1, 2007 10:02:59 PM Subject: Re: Cache coherency issue? Scott, Thanks for your response. On the ARM side, I used Memory_contigAlloc to allocate the input and output buffers; doesn't Memory_contigAlloc automatically handle the contiguous and cache-line alignment requirements? You say the buffers must be sized as an integral number of cache lines; how do I know what is that "number of cache lines" or where do I find that out? I think the "integral number" may be my problem since I only allocate just enough data that I used for the output buffer; the size of my output buffer is 105 bytes, which I doubt is an "integral number". Thanks for your advice, I'll try it out soon and let you know. Regards, Andy ----- Original Message ---- From: "Gary, Scott" <[EMAIL PROTECTED]> To: Andy Ngo <[EMAIL PROTECTED]>; Adam Dawidziuk <[EMAIL PROTECTED]> Cc: "davinci-linux-open-source @linux.davincidsp.com" <davinci-linux-open-source@linux.davincidsp.com> Sent: Thursday, March 1, 2007 8:15:51 PM Subject: RE: Cache coherency issue? Andy, The framework should indeed be handling the necessary buffer invalidates before, and writebacks after, calling the algorithm's process function. To verify this you could try turning on trace, (as in the archived message), and you should see similar statements indicating the operations on the specific buffers. The mask name for these memory calls is "OM". If you are using TraceUtil, you can specify CE_TRACE="OM=012". I don't know if they apply, but some general things to be careful of: - Make sure your buffers are contiguous, cache-line aligned, and sized as an integral number of cache lines. - DSP should not write to IN buffers. - OUT buffers should not be used to pass data to DSP. Hope this helps. Regards, Scott ________________________________ From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Andy Ngo Sent: Thursday, March 01, 2007 1:54 PM To: Andy Ngo; Adam Dawidziuk Cc: davinci-linux-open-source @linux.davincidsp.com Subject: Re: Cache coherency issue? OK, I got desperate and need to get my codec work correctly right away. So I tried forcing a write-back invalidate on the output buffers and now everything seems to work fine; before SPHENC_process returns, I called the following to write-back invalidate the output buffers: Memory_cacheWbInv(outBufs->bufs[0], size); I don't know why the XDAIS framework layer is not doing that automatically for me. Any thoughts anyone? Thanks to X. Zhou (http://www.mail-archive.com/[EMAIL PROTECTED] com/msg00960.html) <http://www.mail-archive.com/[EMAIL PROTECTED] com/msg00960.html> for bringing this up (and still, no one has really given a clear answer). Regards, Andy ----- Original Message ---- From: Andy Ngo <[EMAIL PROTECTED]> To: Adam Dawidziuk <[EMAIL PROTECTED]> Cc: "davinci-linux-open-source @linux.davincidsp.com" <davinci-linux-open-source@linux.davincidsp.com> Sent: Wednesday, February 28, 2007 10:46:21 PM Subject: Re: Cache coherency issue? Adam, Thanks for the quick response. I'm not quite sure what you are saying. I basically took the example sphenc_copy codec as a template and customized it to create my own speech enc codec. All the configurations (contents in .tcf and .cfg) pretty much stayed the same. My codec algorithm does not explicitly call any DMA functions so I'm not sure how I'm "accessing the output buffers both by the CPU and DMA". I added a lot of code and data to the codec so the only change I can see is the increase in memory usage of the DDR region (4MB code, stack, static data). How can I debug this issue to see if the DSP and DMA are accessing the output buffers as you suggested? Like you said, this seems like a caching problem. I guess I'll try to handle the cache coherency myself, which the Framework is supposed to do for me automatically. Anyone, any thoughts. Regards, Andy ----- Original Message ---- From: Adam Dawidziuk <[EMAIL PROTECTED]> To: Andy Ngo <[EMAIL PROTECTED]> Cc: "davinci-linux-open-source @linux.davincidsp.com" <davinci-linux-open-source@linux.davincidsp.com> Sent: Wednesday, February 28, 2007 9:09:49 PM Subject: Re: Cache coherency issue? Andy, Don't get me wrong by personally I think you don't exactly follow DMA rules in your algorithm. It seems that for some reason you access the output buffers both by CPU and DMA. Thus some part of data are left in cache, and some are in external memory. GT_trace probably access all your data by CPU thus performing automatic write-back when cache runs out. Are you 100% sure your data is coherent upon returning from the process call, presumably all in external memory? Hope you figure out the way and share this with community. I would certainly want to see what's going on, sine I had strange cache coherency problems myself... Best, Andy Ngo wrote: > According to a previous post > (http://www.mail-archive.com/[EMAIL PROTECTED] com/msg00960.html), > the XDAIS Framework is suppose to handle cache coherency automatically > (points 1-4 in the post above). Recently, I have > been adding more and more code and data to my DSP speech codec and I've > been getting weird problems with the data exchanged > between the ARM and the DSP. For example, I would always get the same > exact data on the output buffer from a call > to SPHENC_process. In attempt to debug the problem, I put a GT_trace > call in my DSP speech codec to print out the data > that was being returned from SPHENC_process so that I can compare it to > the data I saw being returned to on the ARM side. > Weird thing is that by putting GT_trace in the speech codec, the problem > went away (the return data is different each time). > As soon as I comment out the call to GT_trace, the problem came back > (ARM side sees same data being returned). > > Am I doing something wrong? Is there a cache cohency problem here? Why > does adding a simple GT_trace fix the problem and > my data looks correct? Instead of GT_trace, I tried putting some hard > delays in an attempt to affect timing but that wouldn't > work, only a call to GT_trace work. I've been on this for several days now. > > Please advise. Thanks in advance. > > Regards, > Andy > > > ------------------------------------------------------------------------ > > _______________________________________________ > Davinci-linux-open-source mailing list > Davinci-linux-open-source@linux.davincidsp.com > http://linux.davincidsp.com/mailman/listinfo/davinci-linux-open-source -- Adam Dawidziuk Sentivision _______________________________________________ Davinci-linux-open-source mailing list Davinci-linux-open-source@linux.davincidsp.com http://linux.davincidsp.com/mailman/listinfo/davinci-linux-open-source _______________________________________________ Davinci-linux-open-source mailing list Davinci-linux-open-source@linux.davincidsp.com http://linux.davincidsp.com/mailman/listinfo/davinci-linux-open-source
_______________________________________________ Davinci-linux-open-source mailing list Davinci-linux-open-source@linux.davincidsp.com http://linux.davincidsp.com/mailman/listinfo/davinci-linux-open-source