BTW, I posted a response to Zhou's cache question. I think he got a copy, but no one else did because of the notification below. I've attached my response, modulo the pdf... we'll see if the moderator ultimately approves.
List members, is this 100KB limit "too low"? Chris > -----Original Message----- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] > Sent: Saturday, November 25, 2006 9:41 AM > To: Ring, Chris > Subject: Your message to Davinci-linux-open-source awaits > moderator approval > > Your mail to 'Davinci-linux-open-source' with the subject > > RE: cache coherency problem > > Is being held until the list moderator can review it for approval. > > The reason it is being held: > > Message body is too big: 303472 bytes with a limit of 100 KB > > Either the message will get posted to the list, or you will receive > notification of the moderator's decision. If you would like to cancel > this posting, please visit the following URL: <chop>
--- Begin Message ---I think you understand the cache issue correctly. Namely: * The data buffers must not be cached on the ARM side. * Buffers acquired through CMEM (or preferably CE's Memory_allocContig()) _are_ non-cached on the ARM side - They're mapped into the calling process's space via ioremap_nocache(). * The data buffers _are_ typically cached from the DSP side, and maintenance of the cache coherency is placed on the Codec Engine skeletons. The skeletons are the "remote" side of the RPC mechanism, and explained in the Algorithm Creator's Guide as Jerry Johns previously mentioned. * For the 8 VISA interfaces which Codec Engine provides stubs/skeletons, the Codec Engine skeletons perform this cache maintenance. Namely, invalidate input buffers before the process() call, and writebackInvalidate the output buffers after the process call. And a final, strange constraint... the buffers _should_ be aligned on a cache boundary. On DaVinci, this cache boundary is 128 bits. Cache maintenance is performed on these cache boundaries; if the buffers aren't aligned on this boundary, any overlapping buffer that's in the same "cache page" as a working data buffer could be corrupted. [ The VISA stubs/skeletons _used_ to check for this 128-bit alignment and fail if this constraint wasn't met, but "smart" applications who know about all this cache maintenance can be written correctly (I won't go into that here) so this check was removed. ] Another thing to look out for is if the _algorithm_ _writes_ into the "input" buffers. This is against the xDM spec, but can result in strange data corruption of subsequent input buffers. Because the input buffers are not cache maintained _after_ the process() call (because, per the xDM spec, they don't need to be), any writes to them will be sitting in the cache lines, and my be written back "some time later" as the DSP cache hardware decides. Therefore, a mis-behaving codec who writes into input buffers can corrupt the ARM-side view of data in strange, "late" ways. Which codec are you using? It it a TI codec? I've attached a snapshot of a TI internal TWiki topic that might help as well. (I hope it goes through - there are rumors of the list rejecting files larger than 100kB... we'll see...) Chris ________________________________ From: X. Zhou [mailto:[EMAIL PROTECTED] Sent: Thursday, November 23, 2006 9:27 PM To: Ring, Chris; X. Zhou Cc: [email protected] Subject: cache coherency problem Hi, This nasty problem got me mad!! I am now using DVEM6446 board and DVSDK enviroment to develop a ARM client + DSP video decoder application. Now i found that sometimes the bitstream buffer which is transferred by ARM to DSP exists dirty data case. Detail information about this case were given here: (1) the bitstream buffer was pre-allocated at ARM side, via Memory_contigAlloc() function; [ in my opinion, Memory_contigAlloc() should provide buffers which were not only aligned, but also non-cached and physically contiguous. Isn't it right?? ] (2) each time when i call the VIDDEC_process() interface at ARM side, i pass the bitstream buffer pointer via the "XDM_BufDesc inBufs" parameter. e.g.: streamBuf = Memory_contigAlloc(800000, -1); ................ actualStreamSize = fread(streamBuf, 1, 800000, fp); ................ inBufs.numBufs = 1; inBufAddr[0] = streamBuf; inBufSize[0] = actualStreamSize; inBufs.bufs[0] = inBufAddr; inBufs.bufSizes[0] = inBufSize; printf("ARM : 0x%x 0x%x 0x%x 0x%x 0x%x 0x%x\n", *((unsigned char *)inBufAddr[0]+0), *((unsigned char *)inBufAddr[0]+1), *((unsigned char *)inBufAddr[0]+2), *((unsigned char *)inBufAddr[0]+3), *((unsigned char *)inBufAddr[0]+4), *((unsigned char *)inBufAddr[0]+5)); /* Call the process function to decode the nalu buffers */ status = VIDDEC_process(hdecode, &inBufs, &outBufs, (IVIDDEC_InArgs *)(&decoder_inargs), (IVIDDEC_OutArgs *)(&decoder_outargs)); ................ (3) each time when the dsp-side IVIDDEC_process() function is called, i invaliate the cache related to bitstream buffer. e.g: XDAS_Int32 IVIDDEC_process(IVIDDEC_Handle h, XDM_BufDesc *inBufs, XDM_BufDesc *outBufs, IVIDDEC_InArgs *inArgs, IVIDDEC_OutArgs *outArgs) { int i; for ( i = 0; i < inBufs->numBufs; i ++ ) { GT_6trace(curDecTrace, GT_ENTER, "DSP : before invalid: 0x%x 0x%x 0x%x 0x%x 0x%x 0x%x\n", *((unsigned char *)inBufs->bufs[i]+0), *((unsigned char *)inBufs->bufs[i]+1), *((unsigned char *)inBufs->bufs[i]+2), *((unsigned char *)inBufs->bufs[i]+3), *((unsigned char *)inBufs->bufs[i]+4), *((unsigned char *)inBufs->bufs[i]+5)); BCACHE_inv(inBufs->bufs[i], inBufs->bufSizes[i], TRUE); //invaildate cache GT_6trace(curDecTrace, GT_ENTER, "DSP : after invalid: 0x%x 0x%x 0x%x 0x%x 0x%x 0x%x\n", *((unsigned char *)inBufs->bufs[i]+0), *((unsigned char *)inBufs->bufs[i]+1), *((unsigned char *)inBufs->bufs[i]+2), *((unsigned char *)inBufs->bufs[i]+3), *((unsigned char *)inBufs->bufs[i]+4), *((unsigned char *)inBufs->bufs[i]+5)); } ............. decode_one_frame(inBufs, .....); ............ } While, the experiment results show that sometimes the data readden by DSP are inconsistent with the datat written by ARM, and if inconstistent case exists, it seems that the data readden by dsp in this time is the same data with the data written by ARM last time, How to fix this problem? Is the cache on ARM side unflushed (if Memory_contigAlloc() provides buffers with cachalility) ? or is the cache on DSP side ninvalidated unsucessfully (if BCACHE_inv() is a void function )? I got mad! Help me, please?!!! This message (including any attachments) is for the named addressee(s)'s use only. It may contain sensitive, confidential, private proprietary or legally privileged information intended for a specific individual and purpose, and is protected by law. If you are not the intended recipient, please immediately delete it and all copies of it from your system, destroy any hard copies of it and notify the sender. Any use, disclosure, copying, or distribution of this message and/or any attachments is strictly prohibited.
--- End Message ---
_______________________________________________ Davinci-linux-open-source mailing list [email protected] http://linux.davincidsp.com/mailman/listinfo/davinci-linux-open-source
