> -----Original Message----- > From: ffmpeg-devel [mailto:ffmpeg-devel-boun...@ffmpeg.org] On Behalf > Of Mark Thompson > Sent: October 29, 2017 5:51 PM > To: ffmpeg-devel@ffmpeg.org > Subject: Re: [FFmpeg-devel] Added - HW accelerated H.264 and HEVC > encoding for AMD GPUs based on AMF SDK > > On 29/10/17 20:48, Mironov, Mikhail wrote: > >> -----Original Message----- > >> From: ffmpeg-devel [mailto:ffmpeg-devel-boun...@ffmpeg.org] On > Behalf > >> Of Mark Thompson > >> Sent: October 29, 2017 3:36 PM > >> To: ffmpeg-devel@ffmpeg.org > >> Subject: Re: [FFmpeg-devel] Added - HW accelerated H.264 and HEVC > >> encoding for AMD GPUs based on AMF SDK > >> > >> On 29/10/17 18:39, Mironov, Mikhail wrote: > >>>> > >>>>> + { AV_PIX_FMT_D3D11, AMF_SURFACE_NV12 }, > >>>> > >>>> D3D11 surfaces need not be NV12. The actual format is in > >>>> AVHWFramesContext.sw_format - if you only support 8-bit then > >>>> something nasty probably happens if you give it P010 surfaces. > >>>> > >>> > >>> Agreed, but how should I define D3D11 NV12 as input format if I only > >>> have > >> AV_PIX_FMT_D3D11? > >> > >> Check sw_format afterwards. > >> > > > > sw_format is part of AVHWFramesContext in hwcodec.h But how should I > > define pix_fmt array in AVCodec? For example, In nvenc.c is done via > > AV_PIX_FMT_CUDA. Is it wrong? > > NVENC checks sw_format at init time - see > <http://git.videolan.org/?p=ffmpeg.git;a=blob;f=libavcodec/nvenc.c;h=e1d33 > 16de39cfefa41432105aed91f0d4e93a154;hb=HEAD#l1408>. I think you want > to do the same thing. > > (I agree this result isn't ideal with just the pix_fmt array, but there isn't > currently a way to check sw_format as well from the outside. Checking it at > init time and failing if it isn't usable is the best we can do.) >
Sure, will add checking at initialization. > >>>>> + > >>>>> + // try to reuse existing DX device > >>>>> + > >>>>> + if (avctx->hw_frames_ctx) { > >>>>> + AVHWFramesContext *device_ctx = > >>>>> + (AVHWFramesContext*)avctx- > >>>>> hw_frames_ctx->data; > >>>>> + if (device_ctx->device_ctx->type == > >> AV_HWDEVICE_TYPE_D3D11VA){ > >>>>> + if (device_ctx->device_ctx->hwctx) { > >>>>> + AVD3D11VADeviceContext *device_d3d11 = > >>>> (AVD3D11VADeviceContext *)device_ctx->device_ctx->hwctx; > >>>>> + res = ctx->context->pVtbl->InitDX11(ctx->context, > >>>>> + device_d3d11- > >>>>> device, AMF_DX11_1); > >>>>> + if (res == AMF_OK) { > >>>>> + ctx->hw_frames_ctx = av_buffer_ref(avctx- > >hw_frames_ctx); > >>>>> + } else { > >>>>> + av_log(avctx, AV_LOG_INFO, "amf_shared: > >>>>> + avctx- > >>>>> hw_frames_ctx has non-AMD device, switching to default"); > >>>> > >>>> I'm not sure this is going to act sensibly - if the user has D3D11 > >>>> frames input on another device, does it work? > >>>> > >>> > >>> If device is not AMD's the code is trying to create another device > >>> - the > >> compatible one . > >>> In these cases the submission will be via system memory. > >> > >> And that works with D3D11 frames as hw_frames_ctx on another device? > > > > Why not? AMF will be initialized with a different device, in > > submission code it is detected, surface with system (host) memory is > > allocated, data will be copied using av_image_copy() and host memory will > be submitted to AMF. > > It won't be copied externally if it's already in a D3D11 surface for the other > device. At the moment it looks like it silently encodes an empty frame > because the conditions in ff_amf_encode_frame() make it call > amf_copy_surface(), but av_image_copy() doesn't support copying from > hardware surfaces and has no way to indicate that failure. > > I think the easiest way to solve this is to fail at init time if the given > hw_frames_ctx is on an unusable device. The hw_device_ctx case matters > less, because the user input is not going to be in D3D11 frames in that case. I though that in multi-GPU situation it is still good to support encoding. I could force HW frame to be copied into system memory and consume it in encoder: transfer_data_to(). Should it be better then fail at initialization? > > >>>>> + } else { > >>>>> + surface->pVtbl->Release(surface); > >>>>> + surface = NULL; > >>>>> + AMF_RETURN_IF_FALSE(ctx, res == AMF_OK, > >>>> AVERROR_UNKNOWN, "SubmitInput() failed with error %d", res); > >>>>> + submitted = 1; > >>>>> + } > >>>>> + } > >>>>> + // poll results > >>>>> + if (!data) { > >>>>> + res = ctx->encoder->pVtbl->QueryOutput(ctx->encoder, > &data); > >>>>> + if (data) { > >>>>> + AMFBuffer* buffer; > >>>>> + AMFGuid guid = IID_AMFBuffer(); > >>>>> + data->pVtbl->QueryInterface(data, &guid, > >>>>> + (void**)&buffer); // > >>>> query for buffer interface > >>>>> + ret = amf_copy_buffer(avctx, pkt, buffer); > >>>>> + if (!ret) > >>>>> + *got_packet = 1; > >>>>> + buffer->pVtbl->Release(buffer); > >>>>> + data->pVtbl->Release(data); > >>>>> + if (ctx->eof) { > >>>>> + submitted = true; // we are in the drain > >>>>> + state - no > >> submissions > >>>>> + } > >>>>> + } else if (res == AMF_EOF) { > >>>>> + submitted = true; // drain complete > >>>>> + } else { > >>>>> + if (!submitted) { > >>>>> + av_usleep(1000); // wait and poll again > >>>>> + } > >>>>> + } > >>>>> + } > >>>>> + } > >>>> > >>>> I suspect this setup is not actually going to follow the > >>>> constraints of the deprecated encode2(). Given the API here, I > >>>> think you would be much better off writing with > send_frame()/receive_packet(). > >>> > >>> I considered this, but without a thread that would call > >>> receive_packet() the implementation will fall into the same pattern > >>> as it is > >> now but with an additional queue of ready outputs. > >> > >> See the documentation - you can return EAGAIN to send_packet() when a > >> frame is available, no external queue is required. > > > > I didn’t debug it but from visual code inspection this logic is available > > for > decoders only. > > For encoder call avcodec_send_frame() inside do_video_out() doesn’t > > check for EAGAIN and inside avcodec_send_frame() there is no such > checking. > > Right, sorry, I mean return EAGAIN to receive_packet(). > > This does need to have some way to tell whether the SubmitInput() call will > return AMF_INPUT_FULL. If you have that, then you can block in > receive_packet() iff that is true - if not, just return EAGAIN and get another > frame. This also maximises the number of frames in-flight for the > asynchronous encode. If inside send_packet() SubmitInput() returns AMF_INPUT_FULL there is no way I can trick ffmpeg.exe to resubmit this frame. If I bock it, then receive_packet() will not be called. It is impossible to tell ahead of time that SubmitInput() will returns AMF_INPUT_FULL because it is HW - specific. In MHO the encoder submission should be organized the same way as decoder submission. Till then I don’t see how to implement this. Keep in mind that AMF API is designed exactly the same way as send_frame() / receive_packet() pair of functions. > > >>>>> + // Bitrate > >>>>> + AMF_ASSIGN_PROPERTY_INT64(res, ctx->encoder, > >>>> AMF_VIDEO_ENCODER_TARGET_BITRATE, avctx->bit_rate); > >>>>> + if (ctx->rate_control_mode == > >>>> AMF_VIDEO_ENCODER_RATE_CONTROL_METHOD_CBR) { > >>>>> + AMF_ASSIGN_PROPERTY_INT64(res, ctx->encoder, > >>>> AMF_VIDEO_ENCODER_PEAK_BITRATE, avctx->bit_rate); > >>>>> + } else { > >>>>> + int rc_max_rate = avctx->rc_max_rate >= avctx->bit_rate ? > >>>>> + avctx- > >>>>> rc_max_rate : avctx->bit_rate * 13 / 10; > >>>> > >>>> Where does 13/10 come from? > >>>> > >>> > >>> For best results rc_max_rate should be bigger then bit_rate for > >>> VBR. For > >> CBR it is ignored. > >>> What is the better way to correct an unset parameter? > >> > >> Max rate should only be constrained by the HRD buffering in this > >> case. Are you sure this isn't handled internally if you don't supply the > paramater at all? > >> If not, maybe supply some sort of infinity to avoid it constraining > >> anything appropriately. > > > > I concern about VBR with peaks. For this mode max rate defines height of > the peaks. > > If not defined the encoder will put something valid, but idea was to > > control such thing explicitly from FFmpeg. > > If it is set by the user then certainly use that value, but if not then it's > probably better to make up the answer inside the driver rather than here > (the driver will have better knowledge of how the hardware works, after all). > > >> Some more random questions: > >> > >> * How can we supply colour information to the codecs? (VUI > >> colour_primaries/transfer_characteristics/matrix_coefficients/chroma_ > >> samp > >> le_loc_type.) > > > > There is very limited set of VUI parameters available: timing (shared > > with rate control via frame rate), aspect ratio, and video_full_range_flag, > bit stream restriction and few other related to reordering . > > Inability to set the colour information will do nasty things to the output > video > in some cases. Input coming from JPEG or similar with centred chroma > samples is visibly weird viewed with the default middle-left, and HDR colours > will come out incorrectly. I recommend you add a way to do this, though it > won't block the rest of the patch. VUI is generating deep in the driver or in firmware, you are asking for a driver feature. I will definitely bring this to the driver team and ensure that they implement this at one point. Such inputs are truly appreciated. > > Thanks, > > - Mark > _______________________________________________ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > http://ffmpeg.org/mailman/listinfo/ffmpeg-devel Thanks Mikhail _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel