On Jun 12, 2014, at 1:21 PM, Laurent Birtz <[email protected]> wrote:
> Hi, > > I have an idea for caching the motion vectors found in the inter search, > and reusing them as initial candidates in the fullpel search of other > blocks. > > The central component is the motion vector field (MVF). The MVF keeps > track of the best motion vector found for each reference list and > reference index in a block (f265_mv mvf[2][8]). To save space and to > cater for the spec limitations, we allow up to 8 reference indices per > reference list. > > We define the following MVFs (832 bytes in 13 cache lines): > - f265_mv mvf_16x16[4][2][8]; > - f265_mv mvf_32x32[4][2][8]; > - f265_mv mvf_64x64[4][2][8]; > - f265_mv mvf_part[2][8]; > > The 16x16 MVF contains a MVF for each 8x8 block in a 16x16 CB. The > 32x32/64x64 MVF contains a MVF for each 16x16/32x32 block in a > 32x32/64x64 CB. There is only one 16x16 MVF instance per CTB. It is > shared by all 16x16 blocks in the CTB (same for the 32x32 MVF). > > The partition MVF contains the motion vectors found in the current inter > partition being analyzed. If an inter block contains two partitions > (e.g. 2 16x8 partitions in a 16x16 inter block), then the MVF is shared > by these partitions. The analysis of the second partition trashes the > information found in the first partition. However, since the processing > is symmetrical for both partitions, the second partition can safely read > the MVF data of the first partition before it clobbers it in place. > > The MVFs are zeroed prior to the analysis of a CB. Hence, if a reference > index is not tested due to early termination, the corresponding MV in > the MVF will be zero and it won't be used as a candidate in the > following inter blocks. > > The CB initialization function obtains the 6 MVFs corresponding to the > current CB. The pointers to these MVFs are stored in an array, as follow. > <part> <parent> <sub0> <sub1> <sub2> <sub3> > > "part" is the partition MVF. > > For 8x8 CBs and 16x16 CBs, "sub0-3" are the 4 slots of the 16x16 MVF, > and "parent" is the slot of the 32x32 MVF that corresponds to the > location of the 16x16 CB in the 32x32 CB. > > For 32x32 CBs, "sub0-3" are the 4 slots of the 32x32 MVF, and "parent" > is the slot of the 64x64 MVF that corresponds to the location of the > 32x32 CB in the 64x64 CB. > > For the 64x64 CB, "sub0-3" are the 4 slots of the 64x64 MVF and "parent" > is null. > > Initially, the MVFs are empty. The 16x16/8x8 multi-level search > initializes the parent slot with the 16x16 data and the 8x8 slots with > the 8x8 data. Since the ML search is final for the 16x16 partitions, the > 16x16 partitions do not use the MVF candidates at all. Otherwise, the > analysis of the current inter partition uses the MV from the MVFs as > follow. > > For a 8x8 partition, use the MV from the 16x16 level, the MVs from the > left/top 8x8 blocks if present, and the MV from the current 8x8 block > since it was prepopulated by the ML search. > > For a 8x4/4x8 partition, use the MV from the current 8x8 block, the MV > of the first partition if present, and perhaps the MVs from the two > neighbour 8x8 CBs if present and if it actually helps in practice. > > For the other hybrid partitions (16x8/8x16, 32x16/16x32, 64x32/32x64, > and their AMP equilvalents), use the MVs from the parent level, the MVs > from the two underlying sub blocks, and the MVs of the first partition > if present. > > For the 32x32 and 64x64 partitions, use the MVs from the four sub slots. > > All that logic is implemented efficiently with precomputed tables and an > assembly function. To simplify, there are always four candidates per > partition. In the case where the logic calls for less, the entries are > padded with duplicate partition MVF entries (which was zeroed for the > first partition). Hence, the same MVFs are used for both partitions of > an inter block. > > The pointers to the four chosen MVFs are stored in the mvf_array at the > beginning of the analysis of a partition mode. This is done by looking > up the four offsets in the table corresponding to the block size and the > partition mode. This can be a two-level lookup to save space. > > The assembly function takes as input the two PMVs, the mvf_array, the > MVF offset corresponding to the current list and partition index, and > the clipping bounds. The output is two clipped PMVs, the unique > candidate count, and up to 5 clipped candidate MVs. The function adds > the (0,0) candidate to the four candidates passed as parameter and it > eliminates all candidates that are duplicate of another candidate or PMV > (vectorized branch-free exhaustive comparisons). > > When the 8x8/16x16/32x32 partition mode has been analyzed, the partition > MVs are copied in the corresponding 16x16/32x32/64x64 MVF slot (64 byte > copy). The branch can be avoided by choosing the partition MVF as the > destination for a non-UN mode (i.e. always store by copying 2 ymm > somewhere). > > The only unpredictable branch in the implementation is the loop over the > unique candidates (a variable number) to get the best initial fullpel > position. > > Laurent > -- > To unsubscribe visit http://f265.org > or send a mail to [email protected]. > Hi, I think the idea is good, but I want to make sure of a few points. 1) From what I understand, the search starts at the 16x16 level. That’s the only way for the 32x32 CB to have access to the 16x16 results. The same logic applies for the 64x64 CBs. This isn’t really a question, but can you confirm? 2) Do the MVF solely hold unsplit results? AFAICT, each MVF has 4 slots (quad tree children). Each slot has 2 reference lists, 8 results per list, hence the mvf_2Nx2N[4][2][8] arrays. So if the best result is 16x8, there is no way to store each partition’s winning MV. 3) After the assembly function was called with all the prediction information, is the result directly written to the CB? If not, then at what stage is the motion information transferred to it? I’m assuming the CB’s mv[2][2] and ref_idx[2][2] placeholders are still where we are going to store the values to be used for reconstruction and entropy coding. 4) When you refer to the “unique candidate count”, are you referring to the number of MV we are actually going to test. For instance if we have 5 initial candidates, two of which are (1,1), then we do not have 5 candidates, but 4. The returned list should then contain the 4 candidates to test correct? Otherwise, good work! François -- To unsubscribe visit http://f265.org or send a mail to [email protected].
