>>-----Original Message-----
>>From: [email protected]
>>[mailto:[email protected]] On
>>Behalf Of Zou, Nanhai
>>Sent: 2011年6月22日 12:29
>>To: Keith Packard; [email protected]
>>Cc: Anholt, Eric
>>Subject: Re: [Intel-gfx] gem clflush optimization for media encoding
>>
>>
>>
>>>>-----Original Message-----
>>>>From: Keith Packard [mailto:[email protected]]
>>>>Sent: 2011年6月22日 12:14
>>>>To: Zou, Nanhai; [email protected]
>>>>Cc: Anholt, Eric
>>>>Subject: Re: [Intel-gfx] gem clflush optimization for media encoding
>>>>
>>>>On Wed, 22 Jun 2011 11:13:09 +0800, "Zou, Nanhai" <[email protected]>
>>wrote:
>>>>
>>>>>   If I upload input buffer with movnti or movntdq (bypass cache) +
>>>>>   sfence(clear write combine buffer) in the end, clflush should
>>>>>   not be needed.
>>>>
>>>>Alas, neither of these will flush existing cached data, so you must
>>>>still use clflush to ensure that the data makes it out to memory. All
>>>>that they do is avoid consuming additional cache lines.
>>>>
>>  As I understand,
>>  with movnti + sfence, data should be surly reach memory. Cache should be
>>coherent at this case.
>>
>>>>You want to use a write combining mapping, which should give you full
>>>>bandwidth access to memory without hitting any caches. You can use the GTT
>>>>mapping as the aperture is configured for write combining access, or we
>>>>can figure out how to make PAT work.
>>>>
>>      map_gtt in current gem is super slow.
>>      I've tried map_gtt but it seems that the speed is unacceptable.
>>
>>>>>   Since it is CPU read only surface, clflush in not needed at all.
>>>>
>>>>You'd still have to invalidate cache lines using clflush to avoid using
>>>>stale data in the CPU cache.
>>>>
>>>>--
>>  Yes, you are right, in this case clflush is still needed to invalidate the
>>CPU cache.
>>
>>  The problem is that we do not now how large the coded output buffer is 
>> before
>>we do the encoding.
>>  So we have to allocate a large enough gem object before encoding, in most
>>case the encoding result will be less than 1/10 of the safe buffer size, 9/10
>>of the buffer was unnecessarily clflushed.
>>
>>  A fast map_gtt implementation could be the best choice here.
>>
        Or can we clflush cache line by cache line while reading instead of 
flush the entire object?
        This optimization will have >40% speedup for 1080p encoding.

>>Thanks
>>Zou Nanhai
>>
>>>>[email protected]
>>_______________________________________________
>>Intel-gfx mailing list
>>[email protected]
>>http://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
[email protected]
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

Reply via email to