On 2020-08-13 1:01, Soft Works wrote:

-----Original Message-----
From: ffmpeg-devel <ffmpeg-devel-boun...@ffmpeg.org> On Behalf Of
Steve Lhomme
Sent: Wednesday, August 12, 2020 2:05 PM
To: ffmpeg-devel@ffmpeg.org
Subject: Re: [FFmpeg-devel] [PATCH v3 1/2] dxva: wait until D3D11 buffer
copies are done before submitting them

On 2020-08-11 12:43, Steve Lhomme wrote:
Sorry if you seem to know all the answers already, but I don't and
so I have to investigate.

Last year, I had literally worked this down to death. I followed
every slightest hint from countless searches, read through hundreds
of discussions, driven because I was unwilling to believe that
up-/downloading of video textures with
D3D11 can't be done equally fast as with D3D9.
(the big picture was the implementation of D3D11 support for
QuickSync where the slowdown played a much bigger role than with
D3D11VA decoders only).
Eventually I landed at some internal Nvidia presentation, some talks
with MS guys and some source code discussion deep inside a 3D game
engine (not a no-name). It really bugs me that I didn't properly note
the references, but from somewhere in between I was able to gather
solid evidence about what is legal to do and what Is not. Based on
that, followed several iterations to find the optimal way for doing
the texture transfer. As I had implemented
D3D11 support for QuickSync, this got pretty complicated because with
a full transcoding pipeline, all parts (decoder, encoder and filters)
can (and usually will) request textures. Only the latest Intel
Drivers can work with array textures everywhere (e.g. VPP), so I also
needed to add support for non-array texture allocation. The patch
you've seen is the result of weeks of intensive work (a small but
crucial part of it) - even when it may not look like that.


Sorry if you seem to know all the answers already

Obviously, I don't know all the answers, but all the answers I have
given were correct. And when I didn't have an answer I always
respectfully said that your situation might be different.
And I didn't reply by implying that you would have done your work by
trial-and-error or most likely invalid assumptions or deductions.


I still don't know how you are actually operating this and thus I
also cannot tell what might or might not work in your case.
All I can tell is that the procedure that I have described (1-2-3-4)
can work rock-solid for multi-threaded DX11 texture transfer when
it's done in the same way as I've shown.
And believe it or not - I would still be happy when it would be of
any use for you...

Even though the discussion is heated (fitting with the weather here) I
don't mind. I learned some stuff and it pushed me to dig deeper. I
can't just accept your word for it. I need something solid if I'm
going to remove a lock that helped me so far.

So I'm currently tooling VLC to be able to bring the decoder to its
knees and find out what it can and cannot do safely. So far I can
still see decoding artifacts when I don't a lock, which would mean I
still need the mutex, for the reasons given in the previous mail.

A follow-up on this. Using ID3D10Multithread seems to be enough to have
mostly thread safe ID3D11Device/ID3D11DeviceContext/etc. Even the
decoding with its odd API seem to know what to do when submitted
different buffers.

I did not manage to saturate the GPU but I much bigger decoding
speed/throughput to validate the errors I got before. Many of them were
due to VLC dropping data because of odd timing.

Now I still have some threading issues. For example for deinterlacing we
create a ID3D11VideoProcessor to handle the deinterlacing. And we create it
after the decoding started (as the deinterlacing can be enabled/disabled
dynamically). Without the mutex in the decoder it crashes on
ID3D11VideoDevice::CreateVideoProcessor() and
ID3D11VideoContext::SubmitDecoderBuffers() as they are being called
simultaneously. If I add the mutex between the decoder and just this filter
(not the rendering side) it works fine.

So I guess I'm stuck with the mutex for the time being.

At an earlier stage I had considered the idea of adding those video
processors as ffmpeg hardware filters, but due to the vast amount of
different use cases, platforms and hw accelerations we support,
I had made the decision that we do all filtering either by CPU or in the
hw context of the en-coder, but never in the hw context of the de-coder,
so I don't have any experience with DX11 video processors.

Maybe a too obvious idea: How about activating the mutex use only for
a short time during the process of adding the video processor?

This doesn't seem feasable, even with a callback system. You don't know when it's safe to enable/disable it.

By the way the origin of the mutex was on Windows Phones. It's probably related to the fact that some phones only decode to DXGI_FORMAT_I420_OPAQUE which cannot be used for rendering. The only way to use the decoded surface is to convert it (to NV12) via a VideoProcessor. So in this case it was always used, even for basic decoding.
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Reply via email to