On Fri, Dec 6, 2013 at 7:36 PM, Benjamin Morris <[email protected]> wrote:
> I've gathered a few hints regarding H264 video decoding on our hardware.  
> Hopefully some of them will be useful.

Very useful!

>
> First off, regarding naming in general.  Our internal names for our video 
> engines differ from the names you've been using.  Below is a translation map 
> between the names on http://nouveau.freedesktop.org/wiki/VideoAcceleration/ 
> and our internal names.  This is more of an FYI than anything else, to help 
> translation; I don't expect it to help with this particular H264 hang.
>
> VP2 (same)
> VP3   -> MSDEC
> VP4.0 -> MSDEC2
> VP4.2 -> MSDEC3
> VP5   -> MSDEC4
>
> Looking at your code, it seems that you're instantiating all 3 engines (VLD, 
> PDEC, PPP) on the same channel.  This probably isn't causing the hang, but 
> it's bad practice in general, as it prevents the engines from running in 
> parallel.  It's also impossible to use multiple engines on the same channel 
> like this on MSDEC4 (VP5) GPUs, so the same separate channel usage that you 
> need to have for MSDEC4 should also be used for everything down to G84.

First off, thank you so much for diving into our code! Hope it wasn't
_too_ dirty :)

Yeah, for Kepler, we put things on separate channels. When I did the
VP2 implementation, I also put them on separate channels since it
seemed like that was what was being done from the traces (and I knew
much less about all these things back then). But when I was doing the
initial pass to get MSDEC[12] working based on existing MSDEC[34]
code, I just left it alone, and it had them on the same channel for
pre-kepler. There's some limitation in nouveau that prevents multiple
channels from being active anyways (or something like that, it was
explained to me, but I don't quite remember right now), so it won't
matter either way for now. But in the future the plan is definitely to
move to separate channels.

>
> Regarding "non-obvious" values for H264 decoding, looking at 
> nouveau_vp3_video_vp.c, it looks like there are several unknown values in the 
> H264 picture parameter structure, especially for the DPB reference table.  
> This seems like a potential cause for your MSDEC[12]-specific hangs; 
> incorrect DPB state can be difficult to figure out from picture parameter 
> dumps, and the PDEC response to incorrect DPB state is generally to simply 
> hang.  There are no significant differences between MSDEC[12] (your 
> VP3/VP4.0) and MSDEC3 (your VP4.2) regarding DPB state, but improvements in 
> error resilience/concealment may simply be masking the problem on MSDEC3.  
> Below I've filled in our names for unnamed fields in that structure.  
> Hopefully this allows you to make some quick progress; you can apply the same 
> logic you already have for G84 to your G98 code path.

Great. There are few acronyms in there that I'm not familiar with, but
I suspect sufficient documentation-reading will fix the problem.

One additional question is whether you have any comments on our
inter-engine buffer sizing/usage (inter_data). Specifically the 0x720
method for VP (PDEC?) and 0x414 method on BSP (VLD?). Right now we set
that to the same address/size, but I noticed that you offset them by
0x2100 (iirc). However when I did that, it just caused the engine to
hang faster. (But then I noticed that on 331.20, which is what I used
for my latest traces, the "kernel" fuc code had been updated from the
one that we're extracting with my fw-cutter script, so perhaps the ABI
has changed. Or perhaps the fw-cutter has an insufficiently-precise
signature... I'll check it out later.)

I will look at your comments on our picparm data structure and will
adjust our code. Hopefully that's all that's needed.

Thanks again!

  -ilia

>
> Thanks,
> Ben
>
> struct h264_picparm_vp { // 700..a00
>         uint16_t width, height;
>         uint32_t stride1, stride2; // 04 08
>         uint32_t ofs[6]; // 0c..24 in-image offset
>
>         uint32_t u24; // nfi ac8 ? -> ColocBufferSize
>         uint32_t bucket_size; // 28 bucket size
>         uint32_t inter_ring_data_size; // 2c
>
>         unsigned f0 : 1; // 0 0x01: into 640 shifted by 3, 540 shifted by 5, 
> half size something? -> MbaffFrameFlag
>         unsigned f1 : 1; // 1 0x02: into vuc ofs 56 -> 
> direct_8x8_inference_flag
>         unsigned weighted_pred_flag : 1; // 2 0x04
>         unsigned f3 : 1; // 3 0x08: into vuc ofs 68 -> 
> constrained_intra_pred_flag
>         unsigned is_reference : 1; // 4
>         unsigned interlace : 1; // 5 field_pic_flag
>         unsigned bottom_field_flag : 1; // 6
>         unsigned f7 : 1; // 7 0x80: nfi yet -> second_field (second field of 
> complementary reference field)
>
>         signed log2_max_frame_num_minus4 : 4; // 31 0..3
>         unsigned u31_45 : 2; // 31 4..5 -> chroma_format_idc
>         unsigned pic_order_cnt_type : 2; // 31 6..7
>         signed pic_init_qp_minus26 : 6; // 32 0..5
>         signed chroma_qp_index_offset : 5; // 32 6..10
>         signed second_chroma_qp_index_offset : 5; // 32 11..15
>
>         unsigned weighted_bipred_idc : 2; // 34 0..1
>         unsigned fifo_dec_index : 7; // 34 2..8
>         unsigned tmp_idx : 5; // 34 9..13 -> CurrColIdx (index of associated 
> co-located motion data buffer)
>         unsigned frame_number : 16; // 34 14..29
>         unsigned u34_3030 : 1; // 34 30..30 pp.u34[30:30]
>         unsigned u34_3131 : 1; // 34 31..31 pad?
>
>         uint32_t field_order_cnt[2]; // 38, 3c
>
>         struct { // 40
>                 // 0x00223102
>                 // nfi (needs: top_is_reference, bottom_is_reference, 
> is_long_term, maybe some other state that was saved..
>                 unsigned fifo_idx : 7; // 00 0..6 -> buffer id
>                 // tmp_idx is the index of the associated co-located motion 
> data buffer
>                 // for simplest management, ensure that this is always equal 
> to the buffer id
>                 unsigned tmp_idx : 5; // 00 7..11
>                 unsigned unk12 : 1; // 00 12 not seen yet, but set, maybe 
> top_is_reference -> top_is_reference
>                 unsigned unk13 : 1; // 00 13 not seen yet, but set, maybe 
> bottom_is_reference? -> bottom_is_reference
>                 unsigned unk14 : 1; // 00 14 skipped? -> is_long_term
>                 unsigned notseenyet : 1; // 00 15 pad? -> not_existing
>                 unsigned unk16 : 1; // 00 16 -> is_field_pair
>                 unsigned unk17 : 4; // 00 17..20 -> top_field_marking 
> (top_is_reference ? 1+is_long_term : 0)
>                 unsigned unk21 : 4; // 00 21..24 -> bottom_field_marking
>                 unsigned pad : 7; // 00 d25..31
>
>                 uint32_t field_order_cnt[2]; // 04,08
>                 uint32_t frame_idx; // 0c
>         } refs[0x10];
>
>         uint8_t m4x4[6][16]; // 140
>         uint8_t m8x8[2][64]; // 1a0
>         // most of the remaining is MVC or SVC setup info, filled zero if not 
> MVC or SVC
>         uint32_t u220; // 220 number of extra reorder_list to append?
>         uint8_t u224[0x20]; // 224..244 reorder_list append ?
>         uint8_t nfi244[0xb0]; // add some pad to make sure nulls are read
> };
_______________________________________________
Nouveau mailing list
[email protected]
http://lists.freedesktop.org/mailman/listinfo/nouveau

Reply via email to