Philip Langdale <philipl <at> overt.org> writes: > > On 2015-03-26 04:30, Ali KIZIL wrote: > > > > It works fine now Phil. One more comment: > > > > I have a GTX 980. It can encode upto 30-33 fps for 4K 60fps YUV Raw > > input file using nvenc_h265 avcodec with FFmpeg. First a side, It > > looked > > to me like lack of performance of card. However; after I split the > > video > > with crop filter into 2: > > > > /opt/ffmpeghw/bin/ffmpeg -video_size 3840x2160 -framerate 50 -i > > /Projects/YUV/soccer.yuv -vcodec nvenc_h265 -an -filter:v > > "crop=in_w:in_h/2:0:0" -r 50 -g 50 -preset hp -f hevc top.hevc > > > > /opt/ffmpeghw/bin/ffmpeg -video_size 3840x2160 -framerate 50 -i > > /Projects/YUV/soccer.yuv -vcodec nvenc_h265 -an -filter:v > > "crop=in_w:in_h/2:0:in_h/2" -r 50 -g 50 -preset hp -f hevc bottom.hevc > > > > When I run them at the same time, both can be encoded with 50 fps. I > > tried to joing output files with padding but FFmpeg needs re- encoding > > and it makes no sense. > > > > Do you have any comment or idea to use full performance of the card > > over > > a single ffmpeg nvenc_h265 instance ? > > > > Additional note: GTX cards can suport up to 2 HEVC encoding at the same > > time (as limitation.). > > I honestly don't know. The hardware performance may not scale linearly > with > frame size, so you might see a disproportionate slowdown past a certain > size, > perhaps reflecting the need to use multiple buffers, etc. > > Do you see any evidence that you're CPU bound? That might happen if our > buffer > management is too inefficient, but I'd be surprised. > > --phil >
CPU is fine. I have 2 x Intel(R) Xeon(R) CPU E5-2640 v2 @ 2.00GHz on server and Mem Total 49413456 kB, MemFree: 32030320 kB. So, mem is not an issue also. Here is top output on run: top - 23:39:18 up 1 day, 21 min, 2 users, load average: 0.08, 0.03, 0.05 Tasks: 371 total, 3 running, 368 sleeping, 0 stopped, 0 zombie %Cpu0 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu1 : 0.0 us, 0.3 sy, 0.0 ni, 99.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu2 : 0.3 us, 0.0 sy, 0.0 ni, 99.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu3 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu4 : 0.0 us, 0.3 sy, 0.0 ni, 99.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu5 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu6 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu7 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu8 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu9 : 0.0 us, 0.3 sy, 0.0 ni, 99.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu10 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu11 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu12 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu13 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu14 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu15 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu16 : 29.1 us, 20.3 sy, 0.0 ni, 50.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu17 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu18 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu19 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu20 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu21 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu22 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu23 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu24 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu25 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu26 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu27 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu28 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu29 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu30 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu31 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st KiB Mem: 49413456 total, 19607392 used, 29806064 free, 106188 buffers KiB Swap: 50282492 total, 0 used, 50282492 free. 16826488 cached Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 9563 root 20 0 70.432g 2.003g 1.948g R 49.2 4.3 0:08.02 ffmpeg 735 root 20 0 0 0 0 S 0.3 0.0 5:49.83 blackmagic 9600 root 20 0 22240 1844 1112 R 0.3 0.0 0:00.02 top 1 root 20 0 33696 2960 1472 S 0.0 0.0 0:08.37 init FFmpeg output is: ffmpeg version N-71096-g2139e58 Copyright (c) 2000-2015 the FFmpeg developers built with gcc 4.8 (Ubuntu 4.8.2-19ubuntu1) configuration: --prefix=/opt/ffmpeghw --extra-cflags=- I/opt/ffmpeghw/include --extra-ldflags=-L/opt/ffmpeghw/lib -- bindir=/opt/ffmpeghw/bin --extra-libs=-ldl --enable-libx264 --enable- libx265 --enable-libvpx --enable-libfdk-aac --enable-nonfree -- enable-gpl --enable-nvenc libavutil 54. 20.101 / 54. 20.101 libavcodec 56. 30.100 / 56. 30.100 libavformat 56. 26.101 / 56. 26.101 libavdevice 56. 4.100 / 56. 4.100 libavfilter 5. 13.101 / 5. 13.101 libswscale 3. 1.101 / 3. 1.101 libswresample 1. 1.100 / 1. 1.100 libpostproc 53. 3.100 / 53. 3.100 [rawvideo @ 0x38051a0] Estimating duration from bitrate, this may be inaccurate Input #0, rawvideo, from '/Projects/YUV/soccer.yuv': Duration: 00:01:53.74, start: 0.000000, bitrate: 681695 kb/s Stream #0:0: Video: rawvideo, 1 reference frame (I420 / 0x30323449), yuv420p, 3840x2160, 681672 kb/s, 50 tbr, 50 tbn, 50 tbc [graph 0 input from stream 0:0 @ 0x38050e0] w:3840 h:2160 pixfmt:yuv420p tb:1/50 fr:50/1 sar:0/1 sws_param:flags=2 [auto-inserted scaler 0 @ 0x37f1160] w:iw h:ih flags:'0x4' interl:0 [format @ 0x37fa9c0] auto-inserting filter 'auto-inserted scaler 0' between the filter 'Parsed_null_0' and the filter 'format' [auto-inserted scaler 0 @ 0x37f1160] w:3840 h:2160 fmt:yuv420p sar:0/1 - > w:3840 h:2160 fmt:nv12 sar:0/1 flags:0x4 [nvenc_h265 @ 0x3807900] 1 CUDA capable devices found [nvenc_h265 @ 0x3807900] [ GPU #0 - < GeForce GTX 980 > has Compute SM 5.2, NVENC Available ] [nvenc_h265 @ 0x3807900] Nvenc initialized successfully SOME ADDITIONAL CODE FOR DEBUGGING ctx->init_encode_params.version = -804976880 ctx->init_encode_params.encodeWidth = 3840 ctx->init_encode_params.encodeHeight = 2160 ctx->init_encode_params.darWidth = 3840 ctx->init_encode_params.darHeight = 2160 ctx->init_encode_params.frameRateNum = 50 ctx->init_encode_params.frameRateDen = 1 ctx->init_encode_params.enableEncodeAsync = 0 ctx->init_encode_params.enablePTD = 1 ctx->init_encode_params.reportSliceOffsets = 0 ctx->init_encode_params.enableSubFrameWrite = 0 ctx->init_encode_params.enableExternalMEHints = 0 ctx->init_encode_params.privDataSize = 0 ctx->init_encode_params.enableExternalMEHints = 0 ctx->init_encode_params.maxEncodeWidth = 3840 ctx->init_encode_params.maxEncodeHeight = 2160 ctx->init_encode_params.gopLength = 12 ctx->init_encode_params.frameIntervalP = 1 ctx->init_encode_params.monoChromeEncoding = 0 ctx->init_encode_params.frameFieldMode = 1 ctx->init_encode_params.mvPrecision = 3 encodeConfig.level = 0 encodeConfig.tier = 0 encodeConfig.minCUSize = 2 encodeConfig.maxCUSize = 3 encodeConfig.useConstrainedIntraPred = 0 encodeConfig.disableDeblockAcrossSliceBoundary = 0 encodeConfig.outputBufferingPeriodSEI = 0 encodeConfig.outputPictureTimingSEI = 0 encodeConfig.outputAUD = 0 encodeConfig.enableLTR = 0 encodeConfig.disableSPSPPS = 0 encodeConfig.repeatSPSPPS = 1 encodeConfig.enableIntraRefresh = 0 encodeConfig.idrPeriod = 12 encodeConfig.intraRefreshPeriod = 0 encodeConfig.intraRefreshCnt = 0 encodeConfig.maxNumRefFramesInDPB = 1 encodeConfig.ltrNumFrames = 0 encodeConfig.vpsId = 0 encodeConfig.spsId = 0 encodeConfig.ppsId = 0 encodeConfig.sliceMode = 0 encodeConfig.sliceModeData = 0 encodeConfig.maxTemporalLayersMinus1 = 0 rc_param.constQP = 28 rc_param.averageBitRate = 0 rc_param.maxBitRate = 0 rc_param.vbvBufferSize = 0 [mpegts @ 0x3806820] muxrate VBR, pcr every 5 pkts, sdt every 200, pat/pmt every 40 pkts Output #0, mpegts, to 'out.ts': Metadata: encoder : Lavf56.26.101 Stream #0:0: Video: hevc (nvenc_h265), 1 reference frame, nv12, 3840x2160, q=-1--1, 50 fps, 90k tbn, 50 tbc Metadata: encoder : Lavc56.30.100 nvenc_h265 Stream mapping: Stream #0:0 -> #0:0 (rawvideo (native) -> hevc (nvenc_h265)) Press [q] to stop, [?] for help frame= 765 fps= 29 q=0.0 Lsize= 137360kB time=00:00:15.30 bitrate=73545.9kbits/s video:127337kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 7.871264% Input file #0 (/Projects/YUV/soccer.yuv): Input stream #0:0 (video): 765 packets read (9517824000 bytes); 765 frames decoded; Total: 765 packets (9517824000 bytes) demuxed Output file #0 (out.ts): Output stream #0:0 (video): 765 frames encoded; 765 packets muxed (130392951 bytes); Total: 765 packets (130392951 bytes) muxed [nvenc_h265 @ 0x3807900] Nvenc unloaded I think you are right, performance is not going linear with FPS + Video Size. In a few days, I will be able to test with a higher GM2xx card. I will let you know. _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel