On Sat, 21 Sep 2019 at 11:39, Johanna Nilson <jnil...@gmail.com> wrote: > > Sorry, but I think that the problem is not in profile M10-1B. This article ( > https://support.citrix.com/article/CTX217781) says that we require profile > with equal or more than 1GB to use NVENC, but M10-1B include 1GB, so, it's > ok. > > When I use command: > ffmpeg -f gdigrab -i desktop -framerate 30 -tune zerolatency -r 30 -c:v > hevc_nvenc -f mpegts udp://... > > FFMPEG log clearly says that: > [hevc_nvenc @ 00000000005a6840] dl_fn->cuda_dl->cuInit(0) failed -> > CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected > > There is no coda cores available in the profile M10-1B, but they are not > needed for encoding. So, FFMPEG requirement of them is redundant. But I'm > 100% sure that nvenc is available in configuration M10-1B. I've made a > solution similar to this: > https://github.com/bloodelves88/CloudyNvCapture/blob/master/samples/NvFBC/NvFBCDX9NvEnc/NvFBCDX9NvEnc.cpp > It works ok in M10-1B configuration. It uses Nvidia capture and Nvidia > Nvenc without using cuda to produce h264 frame sequence. The problem in > FFMPEG is that it requires cuda even when it is not used. So, please, can > we do anything with this unneccesary requirement of cuda when we try to use > hardware encoding on nvidia cards? > > It's not only my issue. This man also have similar problems ( > https://superuser.com/questions/1482726/is-there-a-way-to-use-nvenc-for-ffmpeg-without-cuda). > The FFMPEG log is not the same, but question is the same. Changing M10-1B > profile is not a option. It is sutable for using NVENC, the only problem is > that FFMPEG requires CUDA when is is noot needed. > > пт, 20 сент. 2019 г. в 18:26, Dennis Mungai <dmng...@gmail.com>: > > > On Fri, 20 Sep 2019 at 17:55, Johanna Nilson <jnil...@gmail.com> wrote: > > > > > > nvidia-smi -L > > > GPU 0: GRID M10-1B > > > > Seems like a known issue. > > See https://support.citrix.com/article/CTX217781 and this post in > > particular > > https://gridforums.nvidia.com/default/topic/983/xendesktop/m60-nvenc-xd-vda-7-11-only-1gb-vgpu-profiles-and-above-/post/3478/#3478 > > Please switch to a different vGPU with at least 2 GB of VRAM, such as > > M10-4A. > > See this for available profiles: > > https://docs.nvidia.com/grid/latest/grid-vgpu-user-guide/index.html
Ahh, I get it now. In the NVIDIA SDK, an NVENC session can be initialized via either a shared CUDA context or DirectX. Which reminds me: A while back, I ran into something similar with VMWare's ESXi with NVIDIA's vGPU solution (GRID) based on the Tesla T4. Here is what I encountered and the workarounds I tried then: I enabled TCC mode and rebooted: nvidia-smi -g 0 -fdm 1 Then ran the command: ffmpeg.exe -y -thread_queue_size 5120 -use_wallclock_as_timestamps 1 -fflags +genpts -loglevel debug -vsync 1 ^ -f gdigrab -draw_mouse 0 -framerate 60 -i desktop ^ -c:v h264_nvenc -profile:v high -rc:v cbr_ld_hq -r:v 60 -g:v 120 -b:v 8000k -minrate:v 8000k -maxrate:v 8000k -bufsize:v 8000k ^ -an -flush_packets 0 -bsf:v h264_mp4toannexb ^ -muxrate 16000k -pcr_period 20 -mpegts_flags +resend_headers -mpegts_start_pid 0x15 -t 240 -f mpegts "lol.m2ts" log: C:\bin>ffmpeg.exe -y -thread_queue_size 5120 -use_wallclock_as_timestamps 1 -fflags +genpts -loglevel debug -vsync 1 ^ More? -f gdigrab -draw_mouse 0 -framerate 60 -i desktop ^ More? -c:v h264_nvenc -profile:v high -rc:v cbr_ld_hq -r:v 60 -g:v 120 -b:v 8000k -minrate:v 8000k -maxrate:v 8000k -bufsize:v 8000k ^ More? -an -flush_packets 0 -bsf:v h264_mp4toannexb ^ More? -muxrate 16000k -pcr_period 20 -mpegts_flags +resend_headers -mpegts_start_pid 0x15 -t 240 -f mpegts "lol.m2ts" ffmpeg version N-94000-g78e1d7f421-ffmpeg-windows-build-helpers Copyright (c) 2000-2019 the FFmpeg developers built with gcc 8.2.0 (GCC) configuration: --pkg-config=pkg-config --pkg-config-flags=--static --extra-version=ffmpeg-windows-build-helpers --enable-version3 --disable-debug --disable-w32threads --arch=x86_64 --target-os=mingw32 --cross-prefix=/home/brainiarc7/source.build/ffmpeg-windows-build-helpers/sandbox/cross_compilers/mingw-w64-x86_64/bin/x86_64-w64-mingw32- --enable-libcaca --enable-gray --enable-libtesseract --enable-fontconfig --enable-gmp --enable-gnutls --enable-libass --enable-libbluray --enable-libbs2b --enable-libflite --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libilbc --enable-libmodplug --enable-libmp3lame --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopus --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libtheora --enable-libtwolame --enable-libvo-amrwbenc --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libzimg --enable-libzvbi --enable-libmysofa --enable-libaom --enable-libopenjpeg --enable-libopenh264 --enable-liblensfun --enable-libvmaf --enable-libsrt --enable-demuxer=dash --enable-libxml2 --enable-nvenc --enable-nvdec --extra-libs=-lharfbuzz --extra-libs=-lm --extra-libs=-lpthread --extra-cflags=-DLIBTWOLAME_STATIC --extra-cflags=-DMODPLUG_STATIC --extra-cflags=-DCACA_STATIC --enable-amf --enable-libmfx --enable-gpl --enable-avisynth --enable-frei0r --enable-filter=frei0r --enable-librubberband --enable-libvidstab --enable-libx264 --enable-libx265 --enable-libxvid --enable-libxavs --enable-avresample --extra-cflags='-mtune=generic' --extra-cflags=-O3 --enable-static --disable-shared --prefix=/home/brainiarc7/source.build/ffmpeg-windows-build-helpers/sandbox/cross_compilers/mingw-w64-x86_64/x86_64-w64-mingw32 libavutil 56. 28.100 / 56. 28.100 libavcodec 58. 52.102 / 58. 52.102 libavformat 58. 27.103 / 58. 27.103 libavdevice 58. 7.100 / 58. 7.100 libavfilter 7. 55.100 / 7. 55.100 libavresample 4. 0. 0 / 4. 0. 0 libswscale 5. 4.101 / 5. 4.101 libswresample 3. 4.100 / 3. 4.100 libpostproc 55. 4.100 / 55. 4.100 Splitting the commandline. Reading option '-y' ... matched as option 'y' (overwrite output files) with argument '1'. Reading option '-thread_queue_size' ... matched as option 'thread_queue_size' (set the maximum number of queued packets from the demuxer) with argument '5120'. Reading option '-use_wallclock_as_timestamps' ... matched as AVOption 'use_wallclock_as_timestamps' with argument '1'. Reading option '-fflags' ... matched as AVOption 'fflags' with argument '+genpts'. Reading option '-loglevel' ... matched as option 'loglevel' (set logging level) with argument 'debug'. Reading option '-vsync' ... matched as option 'vsync' (video sync method) with argument '1'. Reading option '-f' ... matched as option 'f' (force format) with argument 'gdigrab'. Reading option '-draw_mouse' ... matched as AVOption 'draw_mouse' with argument '0'. Reading option '-framerate' ... matched as AVOption 'framerate' with argument '60'. Reading option '-i' ... matched as input url with argument 'desktop'. Reading option '-c:v' ... matched as option 'c' (codec name) with argument 'h264_nvenc'. Reading option '-profile:v' ... matched as option 'profile' (set profile) with argument 'high'. Reading option '-rc:v' ... matched as AVOption 'rc:v' with argument 'cbr_ld_hq'. Reading option '-r:v' ... matched as option 'r' (set frame rate (Hz value, fraction or abbreviation)) with argument '60'. Reading option '-g:v' ... matched as AVOption 'g:v' with argument '120'. Reading option '-b:v' ... matched as option 'b' (video bitrate (please use -b:v)) with argument '8000k'. Reading option '-minrate:v' ... matched as AVOption 'minrate:v' with argument '8000k'. Reading option '-maxrate:v' ... matched as AVOption 'maxrate:v' with argument '8000k'. Reading option '-bufsize:v' ... matched as AVOption 'bufsize:v' with argument '8000k'. Reading option '-an' ... matched as option 'an' (disable audio) with argument '1'. Reading option '-flush_packets' ... matched as AVOption 'flush_packets' with argument '0'. Reading option '-bsf:v' ... matched as option 'bsf' (A comma-separated list of bitstream filters) with argument 'h264_mp4toannexb'. Reading option '-muxrate' ... matched as AVOption 'muxrate' with argument '16000k'. Reading option '-pcr_period' ... matched as AVOption 'pcr_period' with argument '20'. Reading option '-mpegts_flags' ... matched as AVOption 'mpegts_flags' with argument '+resend_headers'. Reading option '-mpegts_start_pid' ... matched as AVOption 'mpegts_start_pid' with argument '0x15'. Reading option '-t' ... matched as option 't' (record or transcode "duration" seconds of audio/video) with argument '240'. Reading option '-f' ... matched as option 'f' (force format) with argument 'mpegts'. Reading option 'lol.m2ts' ... matched as output url. Finished splitting the commandline. Parsing a group of options: global . Applying option y (overwrite output files) with argument 1. Applying option loglevel (set logging level) with argument debug. Applying option vsync (video sync method) with argument 1. Successfully parsed a group of options. Parsing a group of options: input url desktop. Applying option thread_queue_size (set the maximum number of queued packets from the demuxer) with argument 5120. Applying option f (force format) with argument gdigrab. Successfully parsed a group of options. Opening an input file: desktop. [gdigrab @ 000001a5e9277bc0] Capturing whole desktop as 1920x1080x32 at (0,0) [gdigrab @ 000001a5e9277bc0] Probe buffer size limit of 5000000 bytes reached [gdigrab @ 000001a5e9277bc0] Stream #0: not enough frames to estimate rate; consider increasing probesize Input #0, gdigrab, from 'desktop': Duration: N/A, start: 1560869500.651357, bitrate: 3981337 kb/s Stream #0:0, 1, 1/1000000: Video: bmp, 1 reference frame, bgra, 1920x1080, 0/1, 3981337 kb/s, 60 fps, 1000k tbr, 1000k tbn, 1000k tbc Successfully opened the file. Parsing a group of options: output url lol.m2ts. Applying option c:v (codec name) with argument h264_nvenc. Applying option profile:v (set profile) with argument high. Applying option r:v (set frame rate (Hz value, fraction or abbreviation)) with argument 60. Applying option b:v (video bitrate (please use -b:v)) with argument 8000k. Applying option an (disable audio) with argument 1. Applying option bsf:v (A comma-separated list of bitstream filters) with argument h264_mp4toannexb. Applying option t (record or transcode "duration" seconds of audio/video) with argument 240. Applying option f (force format) with argument mpegts. Successfully parsed a group of options. Opening an output file: lol.m2ts. [file @ 000001a5e927edc0] Setting default whitelist 'file,crypto' Successfully opened the file. Stream mapping: Stream #0:0 -> #0:0 (bmp (native) -> h264 (h264_nvenc)) Press [q] to stop, [?] for help cur_dts is invalid st:0 (0) [init:0 i_done:0 finish:0] (this is harmless if it occurs once at the start per stream) detected 16 logical cores [graph 0 input from stream 0:0 @ 000001a5e981f640] Setting 'video_size' to value '1920x1080' [graph 0 input from stream 0:0 @ 000001a5e981f640] Setting 'pix_fmt' to value '28' [graph 0 input from stream 0:0 @ 000001a5e981f640] Setting 'time_base' to value '1/1000000' [graph 0 input from stream 0:0 @ 000001a5e981f640] Setting 'pixel_aspect' to value '0/1' [graph 0 input from stream 0:0 @ 000001a5e981f640] Setting 'sws_param' to value 'flags=2' [graph 0 input from stream 0:0 @ 000001a5e981f640] Setting 'frame_rate' to value '60/1' [graph 0 input from stream 0:0 @ 000001a5e981f640] w:1920 h:1080 pixfmt:bgra tb:1/1000000 fr:60/1 sar:0/1 sws_param:flags=2 [format @ 000001a5e981c3c0] Setting 'pix_fmts' to value 'yuv420p|nv12|p010le|yuv444p|p016le|yuv444p16le|bgr0|rgb0|cuda|d3d11' [auto_scaler_0 @ 000001a5e981a940] Setting 'flags' to value 'bicubic' [auto_scaler_0 @ 000001a5e981a940] w:iw h:ih flags:'bicubic' interl:0 [format @ 000001a5e981c3c0] auto-inserting filter 'auto_scaler_0' between the filter 'Parsed_null_0' and the filter 'format' [AVFilterGraph @ 000001a5e927dec0] query_formats: 5 queried, 3 merged, 1 already done, 0 delayed [auto_scaler_0 @ 000001a5e981a940] picking rgb0 out of 8 ref:bgra alpha:1 [swscaler @ 000001a5eb0100c0] Forcing full internal H chroma due to input having non subsampled chroma [auto_scaler_0 @ 000001a5e981a940] w:1920 h:1080 fmt:bgra sar:0/1 -> w:1920 h:1080 fmt:rgb0 sar:0/1 flags:0x4 [h264_nvenc @ 000001a5e927c1c0] Loaded lib: nvcuda.dll [h264_nvenc @ 000001a5e927c1c0] Loaded sym: cuInit [h264_nvenc @ 000001a5e927c1c0] Loaded sym: cuDeviceGetCount [h264_nvenc @ 000001a5e927c1c0] Loaded sym: cuDeviceGet [h264_nvenc @ 000001a5e927c1c0] Loaded sym: cuDeviceGetAttribute [h264_nvenc @ 000001a5e927c1c0] Loaded sym: cuDeviceGetName [h264_nvenc @ 000001a5e927c1c0] Loaded sym: cuDeviceComputeCapability [h264_nvenc @ 000001a5e927c1c0] Loaded sym: cuCtxCreate_v2 [h264_nvenc @ 000001a5e927c1c0] Loaded sym: cuCtxSetLimit [h264_nvenc @ 000001a5e927c1c0] Loaded sym: cuCtxPushCurrent_v2 [h264_nvenc @ 000001a5e927c1c0] Loaded sym: cuCtxPopCurrent_v2 [h264_nvenc @ 000001a5e927c1c0] Loaded sym: cuCtxDestroy_v2 [h264_nvenc @ 000001a5e927c1c0] Loaded sym: cuMemAlloc_v2 [h264_nvenc @ 000001a5e927c1c0] Loaded sym: cuMemAllocPitch_v2 [h264_nvenc @ 000001a5e927c1c0] Loaded sym: cuMemsetD8Async [h264_nvenc @ 000001a5e927c1c0] Loaded sym: cuMemFree_v2 [h264_nvenc @ 000001a5e927c1c0] Loaded sym: cuMemcpy2D_v2 [h264_nvenc @ 000001a5e927c1c0] Loaded sym: cuMemcpy2DAsync_v2 [h264_nvenc @ 000001a5e927c1c0] Loaded sym: cuGetErrorName [h264_nvenc @ 000001a5e927c1c0] Loaded sym: cuGetErrorString [h264_nvenc @ 000001a5e927c1c0] Loaded sym: cuStreamCreate [h264_nvenc @ 000001a5e927c1c0] Loaded sym: cuStreamQuery [h264_nvenc @ 000001a5e927c1c0] Loaded sym: cuStreamSynchronize [h264_nvenc @ 000001a5e927c1c0] Loaded sym: cuStreamDestroy_v2 [h264_nvenc @ 000001a5e927c1c0] Loaded sym: cuStreamAddCallback [h264_nvenc @ 000001a5e927c1c0] Loaded sym: cuEventCreate [h264_nvenc @ 000001a5e927c1c0] Loaded sym: cuEventDestroy_v2 [h264_nvenc @ 000001a5e927c1c0] Loaded sym: cuEventSynchronize [h264_nvenc @ 000001a5e927c1c0] Loaded sym: cuEventQuery [h264_nvenc @ 000001a5e927c1c0] Loaded sym: cuEventRecord [h264_nvenc @ 000001a5e927c1c0] Loaded sym: cuLaunchKernel [h264_nvenc @ 000001a5e927c1c0] Loaded sym: cuModuleLoadData [h264_nvenc @ 000001a5e927c1c0] Loaded sym: cuModuleUnload [h264_nvenc @ 000001a5e927c1c0] Loaded sym: cuModuleGetFunction [h264_nvenc @ 000001a5e927c1c0] Loaded sym: cuTexObjectCreate [h264_nvenc @ 000001a5e927c1c0] Loaded sym: cuTexObjectDestroy [h264_nvenc @ 000001a5e927c1c0] Loaded sym: cuGLGetDevices_v2 [h264_nvenc @ 000001a5e927c1c0] Loaded sym: cuGraphicsGLRegisterImage [h264_nvenc @ 000001a5e927c1c0] Loaded sym: cuGraphicsUnregisterResource [h264_nvenc @ 000001a5e927c1c0] Loaded sym: cuGraphicsMapResources [h264_nvenc @ 000001a5e927c1c0] Loaded sym: cuGraphicsUnmapResources [h264_nvenc @ 000001a5e927c1c0] Loaded sym: cuGraphicsSubResourceGetMappedArray [h264_nvenc @ 000001a5e927c1c0] Loaded sym: cuDeviceGetUuid [h264_nvenc @ 000001a5e927c1c0] Loaded sym: cuImportExternalMemory [h264_nvenc @ 000001a5e927c1c0] Loaded sym: cuDestroyExternalMemory [h264_nvenc @ 000001a5e927c1c0] Loaded sym: cuExternalMemoryGetMappedBuffer [h264_nvenc @ 000001a5e927c1c0] Loaded sym: cuExternalMemoryGetMappedMipmappedArray [h264_nvenc @ 000001a5e927c1c0] Loaded sym: cuMipmappedArrayGetLevel [h264_nvenc @ 000001a5e927c1c0] Loaded sym: cuMipmappedArrayDestroy [h264_nvenc @ 000001a5e927c1c0] Loaded sym: cuImportExternalSemaphore [h264_nvenc @ 000001a5e927c1c0] Loaded sym: cuDestroyExternalSemaphore [h264_nvenc @ 000001a5e927c1c0] Loaded sym: cuSignalExternalSemaphoresAsync [h264_nvenc @ 000001a5e927c1c0] Loaded sym: cuWaitExternalSemaphoresAsync [h264_nvenc @ 000001a5e927c1c0] Loaded lib: nvEncodeAPI64.dll [h264_nvenc @ 000001a5e927c1c0] Loaded sym: NvEncodeAPICreateInstance [h264_nvenc @ 000001a5e927c1c0] Loaded sym: NvEncodeAPIGetMaxSupportedVersion [h264_nvenc @ 000001a5e927c1c0] Loaded Nvenc version 9.0 [h264_nvenc @ 000001a5e927c1c0] Nvenc initialized successfully [h264_nvenc @ 000001a5e927c1c0] 1 CUDA capable devices found [h264_nvenc @ 000001a5e927c1c0] [ GPU #0 - < GRID T4-2B > has Compute SM 7.5 ] [h264_nvenc @ 000001a5e927c1c0] dl_fn->cuda_dl->cuCtxCreate(&ctx->cu_context_internal, 0, cu_device) failed -> CUDA_ERROR_UNKNOWN: unknown error [h264_nvenc @ 000001a5e927c1c0] No NVENC capable devices found [h264_nvenc @ 000001a5e927c1c0] Nvenc unloaded Error initializing output stream 0:0 -- Error while opening encoder for output stream #0:0 - maybe incorrect parameters such as bit_rate, rate, width or height [AVIOContext @ 000001a5e9819840] Statistics: 0 seeks, 0 writeouts Conversion failed! I then disabled TCC mode and rebooted: nvidia-smi -g 0 -dm 0 Then ran the command: ffmpeg.exe -y -thread_queue_size 5120 -use_wallclock_as_timestamps 1 -fflags +genpts -loglevel debug -vsync 1 ^ -f gdigrab -draw_mouse 0 -framerate 60 -i desktop ^ -c:v h264_nvenc -profile:v high -rc:v cbr_ld_hq -r:v 60 -g:v 120 -b:v 8000k -minrate:v 8000k -maxrate:v 8000k -bufsize:v 8000k ^ -an -flush_packets 0 -bsf:v h264_mp4toannexb ^ -muxrate 16000k -pcr_period 20 -mpegts_flags +resend_headers -mpegts_start_pid 0x15 -t 240 -f mpegts "lol.m2ts" And ran into the same issue. This limitation was overcome by switching to a larger vGPU config, so I basically gave up on the case and never looked into it again. With that in mind, perhaps you could try running the same but with dxva hwaccel instead? Perhaps (and I could be wrong) switching to dxva2 as a hwaccel should flip the device type for NVENC as DirectX instead of CUDA. That interop seems to be implemented, see https://github.com/FFmpeg/FFmpeg/blob/master/libavcodec/nvenc.c#L52 See an example of such a command with your parameters: ffmpeg.exe -y -thread_queue_size 5120 ^ -fflags +genpts -loglevel debug -vsync 1 -hwaccel dxva2 -hwaccel_device 0 ^ -f gdigrab -draw_mouse 0 -framerate 60 -i desktop ^ -c:v h264_nvenc -profile:v high -preset:v llhq -rc:v cbr_ld_hq -r:v 60 -g:v 120 -b:v 8000k -minrate:v 8000k -maxrate:v 8000k -bufsize:v 8000k -gpu:v 0 ^ -an -flush_packets 0 -bsf:v h264_mp4toannexb ^ -muxrate 16000k -pcr_period 20 -mpegts_flags +resend_headers -f mpegts "udp://..." Let me know how that goes. _______________________________________________ ffmpeg-user mailing list ffmpeg-user@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-user To unsubscribe, visit link above, or email ffmpeg-user-requ...@ffmpeg.org with subject "unsubscribe".