I am using HiSilicon 3516A board, there is no V4L device but other approach 
using /dev/venc ioctl calls and MPP API.
I am able to open the encoder, read raw H264 packets but no go, looks like 
read_packet is called about 10 times slower than it should.
If i move the code to outside of ffmpeg and open it as:
dump_stream | ffmpeg -i - vcodec copy out.h264 //it works at good speed.

Why libav is not calling read_packet as fast as it should?

what board are you using? if there is already v4l2 support in the kernel
it should be a matter of just calling the ffmpeg with the right encoding
below an example to encode an NV12 YUV format:

$ ffmpeg -f rawvideo -pix_fmt nv12 -s:v 1280:720 -r 25 -i
~/Videos/raw/freeway.yuv -c:v h264_v4l2m2m out/out.h264.mp4

