Re: [libav-devel] [PATCH 2/2] Add a WebP lossless decoder

Kostya Shishkov Sat, 14 Sep 2013 08:28:35 -0700

On Sat, Sep 14, 2013 at 09:59:20AM -0400, Justin Ruggles wrote:
> On 09/14/2013 03:46 AM, Kostya Shishkov wrote:
> > On Fri, Sep 13, 2013 at 04:31:00PM -0400, Justin Ruggles wrote:
> >> ---
> >>  doc/general.texi  |    2 +-
> >>  libavcodec/webp.c | 1173 
> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++-
> >>  2 files changed, 1164 insertions(+), 11 deletions(-)
> > 
> > still no Changelog entry?
> 
> Will fix.
> 
[...]
> >> +
> >> +enum ImageRole {
> >> +    /* Stores the actual pixels of the image. */
> >> +    IMAGE_ROLE_ARGB,
> >> +
> >> +    /* Stores the meta Huffman codes. The red and green components of a 
> >> pixel
> >> +       define the meta Huffman code used in a particular block of the ARGB
> >> +       image. */
> >> +    IMAGE_ROLE_ENTROPY,
> >> +
> >> +    /* Stores the metadata for Predictor Transform. The green component 
> >> of a
> >> +       pixel defines which of the 14 predictors is used within a 
> >> particular
> >> +       block of the ARGB image. */
> >> +    IMAGE_ROLE_PREDICTOR,
> >> +
> >> +    /* It is created by ColorTransformElement values for different blocks 
> >> of
> >> +       the image. Each ColorTransformElement is treated as a pixel whose 
> >> alpha
> >> +       component is 255, red component is cte.red_to_blue, green 
> >> component is
> >> +       cte.green_to_blue and blue component is cte.green_to_red. */
> >> +    IMAGE_ROLE_COLOR_TRANSFORM,
> >> +
> >> +    /* Color indexing image: An array of of size color_table_size (up to 
> >> 256
> >> +       ARGB values) storing the metadata for the Color Indexing 
> >> Transform. This
> >> +       is stored as an image of width color_table_size and height 1. */
> >> +    IMAGE_ROLE_COLOR_INDEXING,
> >> +
> >> +    IMAGE_ROLE_NB,
> >> +};
> > 
> > WTF, simply WTF
> > Even with comments it's impossible to comprehend what is it for.
> > Was it borrowed from the original specification as is?
> 
> Yes, that's from the spec. Basically the information for each of those
> is coded as an "entropy coded image". All are essentially coded the same
> way and are treated as an ARGB image (though not all color components
> are always used for real information). I do admit the terminology from
> the spec is not particularly good... I'll try to rewrite them in a less
> confusing way.


All frame's a stage,
And all images are merely players:
They have their lifecycle ends and starts,
Its act being some roles. At first, the entropy image,
...
etc, etc.

> >> +
> >> +typedef struct HuffReader {
> >> +    VLC vlc;
> >> +    int simple;
> > 
> > I'd rather call it binary, not simple.
> 
> That term is from the spec. And yeah, it's either no code (typically
> used for all 0's) or binary.

Okay, keep it then (let people curse the spec).

> >> +    int nb_symbols;
> >> +    uint16_t simple_symbols[2];
> >> +} HuffReader;
> >> +
> >> +typedef struct HuffmanGroup {
> >> +    HuffReader huffman_code[HUFFMAN_CODES_PER_META_CODE];
> >> +} HuffmanGroup;
> >> +
> >> +typedef struct ImageContext {
> >> +    enum ImageRole role;
> >> +    AVFrame *frame;
> >> +    int color_cache_bits;
> >> +    uint8_t *color_cache;
> >> +    int num_huffman_groups;
> >> +    HuffmanGroup *huffman_groups;
> >> +} ImageContext;
> >> +
> >> +typedef union {
> >> +    uint8_t u8;
> >> +    int8_t  s8;
> >> +} av_alias alias8;
> > 
> > looks unneeded - we have such thing in lavu for sure
> 
> Will someone please point this out? I obviously looked there first
> before putting it here and didn't find anything.

You just need some functions like int8_t u8tos8(uint8_t val) in lavc/mathops.h
(patchisrequired) and in the most cases you don't need such type at all.
 
> >>  typedef struct WebPContext {
> >>      VP8Context v;
> >> +    AVCodecContext *avctx;
> >>      int initialized;
> >> +    GetBitContext gb;
> >>      GetByteContext alpha_gb;
> >>      uint8_t has_alpha;
> >>      int width;
> >>      int height;
> >> +
> >> +    int lossless;
> >> +    ImageContext image[IMAGE_ROLE_NB];
> >> +    enum TransformType transforms[4];
> >> +    int nb_transforms;
> >> +    int subtract_green;
> >> +    int predictor_block_size;
> >> +    int color_block_size;
> >> +    int color_index_width_bits;
> >> +    int reduced_width;
> >> +    int use_entropy_image;
> >> +    int entropy_size;
> >>  } WebPContext;
> >>  
> >> +static av_always_inline uint8_t *get_pixel(AVFrame *frame, int x, int y)
> >> +{
> >> +    if (x < 0 || y < 0)
> >> +        return NULL;
> >> +    if (x >= frame->width) {
> >> +        x = 0;
> >> +        y++;
> >> +    }
> >> +    if (y >= frame->height)
> >> +        return NULL;
> >> +
> >> +    return frame->data[0] + y * frame->linesize[0] + 4 * x;
> >> +}
> > 
> > looks unneeded
> 
> The part that is useful is selecting the first pixel on the next row
> when getting the top-right pixel from a pixel on the far right column.
> Plus it simplifies the code. The checks are probably unnecessary though.

without the checks it boils down to the last line
 
> >> +static av_always_inline uint8_t add_comp(int a, int b)
> >> +{
> >> +    return (a + b) & 0xFF;
> >> +}
> > 
> > looks unneeded (especially masking) and you seem to use it on 
> > uint8_t->uint8_t
> > anyway
> 
> Ok, I'll go through and make sure everything still works without it.
> 
> >> +
> >> +static av_always_inline uint8_t avg2_comp(int a, int b)
> >> +{
> >> +    return (a + b) / 2;
> >> +}
> >> +
> >> +static av_always_inline uint8_t clamp_add_subtract_full(int a, int b, int 
> >> c)
> >> +{
> >> +    return av_clip_uint8(a + b - c);
> >> +}
> >> +
> >> +static av_always_inline uint8_t clamp_add_subtract_half(int a, int b, int 
> >> c)
> >> +{
> >> +    int d = avg2_comp(a, b);
> >> +    return av_clip_uint8(d + (d - c) / 2);
> >> +}
> > 
> > nit: this one looks like it can be simplified a bit
> 
> In what way?

(3 * a + 3 * b - 2 * c) >> 2

just saying

[...]
> >> +    }
> >> +    memset(img, 0, sizeof(*img));
> >> +}
> >> +
> >> +
> >> +/* Differs from get_vlc2() in the following ways:
> >> + *   - codes are bit-reversed
> >> + *   - assumes 8-bit table to make reversal simpler
> >> + *   - assumes max depth of 2 since the max code length for WebP is 15
> >> + */
> >> +static av_always_inline int webp_get_vlc(GetBitContext *gb, VLC_TYPE 
> >> (*table)[2])
> >> +{
> >> +    int n, nb_bits;
> >> +    unsigned int index;
> >> +    int code;
> >> +
> >> +    OPEN_READER(re, gb);
> >> +    UPDATE_CACHE(re, gb);
> >> +
> >> +    index = SHOW_UBITS(re, gb, 8);
> >> +    index = ff_reverse[index];
> >> +    code  = table[index][0];
> >> +    n     = table[index][1];
> >> +
> >> +    if (n < 0) {
> >> +        LAST_SKIP_BITS(re, gb, 8);
> >> +        UPDATE_CACHE(re, gb);
> >> +
> >> +        nb_bits = -n;
> >> +
> >> +        index = SHOW_UBITS(re, gb, nb_bits);
> >> +        index = (ff_reverse[index] >> (8 - nb_bits)) + code;
> >> +        code  = table[index][0];
> >> +        n     = table[index][1];
> >> +    }
> >> +    SKIP_BITS(re, gb, n);
> >> +
> >> +    CLOSE_READER(re, gb);
> >> +
> >> +    return code;
> >> +}
> > 
> > is it possible just to swap/reverse the whole bitstream or there are reasons
> > preventing this (e.g. mix of bit fields in normal order and those VLCs)?
> 
> That is exactly the case. It is a mix.

Fine, but I like the format design less and less...
 
[...]
> >> +#define PARSE_BLOCK_SIZE(w, h) do {                                       
> >>   \
> >> +    block_bits = get_bits(&s->gb, 3) + 2;                                 
> >>   \
> >> +    block_size = 1 << block_bits;                                         
> >>   \
> >> +    blocks_w   = FFALIGN((w), block_size) / block_size;                   
> >>   \
> >> +    blocks_h   = FFALIGN((h), block_size) / block_size;                   
> >>   \
> > 
> > FFALIGN() >> block_bits?
> 
> will fix

both are right, I just dislike divisions

> >> +} while (0)
> >> +
> >> +static int decode_entropy_image(WebPContext *s, ImageContext *rgba_img)
> >> +{
> >> +    int ret, block_bits, block_size, width, blocks_w, blocks_h;
> >> +
> >> +    width = s->width;
> >> +    if (s->color_index_width_bits > 0)
> >> +        width = s->reduced_width;
> >> +
> >> +    PARSE_BLOCK_SIZE(width, s->height);
> >> +
> >> +    ret = decode_image(s, IMAGE_ROLE_ENTROPY, blocks_w, blocks_h);
> >> +    if (ret < 0)
> >> +        return ret;
> > 
> > Please select better names, because here it looked like a loop (decode_image
> > calls decode_entropy_coded_image and not decode_entropy_image).
> 
> Yeah, the names suck a lot. It's partly because the terminology in the
> spec sucks a lot and I can't think of anything better. I should probably
> write up an overview at the top of the file that describes how this
> crazy format is structured. In this case, decode_image() is mainly a
> wrapper around decode_entropy_coded_image() that just sets up the
> AVFrame stuff, so I'll merge those two.

Good.

> >> +
> >> +    s->use_entropy_image = 1;
> >> +    s->entropy_size      = block_size;
> >> +
> >> +    /* the number of huffman groups is determined by the maximum group 
> >> number
> >> +       coded in the entropy image */
> >> +    {
> >> +        ImageContext *img;
> >> +        int x, y, max;
> >> +        uint8_t *p;
> >> +
> >> +        img = &s->image[IMAGE_ROLE_ENTROPY];
> >> +        max = 0;
> >> +        for (y = 0; y < img->frame->height; y++) {
> >> +            for (x = 0; x < img->frame->width; x++) {
> >> +                p = get_pixel(img->frame, x, y);
> >> +                max = FFMAX(max, p[2]);
> >> +            }
> >> +        }
> >> +        rgba_img->num_huffman_groups = max + 1;
> >> +    }
> > 
> > why such block?
> 
> no particular reason. I'll change it.

please note that I've pointed only to two of them, there might be more
 
[...]
> >> +/* PRED_MODE_AVG_AVG_L_TL_AVG_T_TR */
> >> +static void inv_predict_10(uint8_t *p, const uint8_t *p_l, const uint8_t 
> >> *p_tl,
> >> +                           const uint8_t *p_t, const uint8_t *p_tr)
> >> +{
> >> +    p[0] = avg2_comp(avg2_comp(p_l[0], p_tl[0]), avg2_comp(p_t[0], 
> >> p_tr[0]));
> >> +    p[1] = avg2_comp(avg2_comp(p_l[1], p_tl[1]), avg2_comp(p_t[1], 
> >> p_tr[1]));
> >> +    p[2] = avg2_comp(avg2_comp(p_l[2], p_tl[2]), avg2_comp(p_t[2], 
> >> p_tr[2]));
> >> +    p[3] = avg2_comp(avg2_comp(p_l[3], p_tl[3]), avg2_comp(p_t[3], 
> >> p_tr[3]));
> > 
> > p_l[0] + p_tl[0] + p_t[0] + p_tr[0] >> 2 ?
> 
> That's not the same due to rounding.
> Example: 2, 3, 7, 12
> avg(2,3)=2; avg(7,12)=9; avg(2,9)=5
> 2 + 3 + 7 + 12 >> 2 = 6

indeed, disregard it then

If I were Niedermayer I'd suggest usign SWAR

unsigned PL = AV_RN32(p_l), PTL = AV_RN32(p_tl), ....;

A0 = (PL & PTL) + ((PL ^ PTL) >> 1);
A1 = (PT & PTR) + ((PT ^ PTR) >> 1);
AV_WN32(p, (A0 & A1) + ((A0 ^ A1) >> 1));

but since I'm not I don't.

> > honestly, those avg2_comp() everywhere do not improve anything
> 
> Ok, I can expand it all out if you think it's clearer.

IMO having (a + b) >> 1 inplace is clearer.
 
[...]
> >> +static int apply_color_transform(WebPContext *s)
> >> +{
> >> +    ImageContext *img, *cimg;
> >> +    int x, y, cx, cy;
> >> +    uint8_t *p, *cp;
> >> +    int block_size = s->color_block_size;
> >> +
> >> +    img   = &s->image[IMAGE_ROLE_ARGB];
> >> +    cimg  = &s->image[IMAGE_ROLE_COLOR_TRANSFORM];
> >> +
> >> +    for (y = 0; y < img->frame->height; y++) {
> >> +        for (x = 0; x < img->frame->width; x++) {
> >> +            alias8 green_to_red, green_to_blue, red_to_blue;
> >> +            alias8 green, red;
> >> +            int blue;
> >> +
> >> +            cx = x / block_size;
> >> +            cy = y / block_size;
> >> +            cp = get_pixel(cimg->frame, cx, cy);
> >> +            red_to_blue.u8   = cp[1];
> >> +            green_to_blue.u8 = cp[2];
> >> +            green_to_red.u8  = cp[3];
> >> +
> >> +            p = get_pixel(img->frame, x, y);
> >> +            red.u8   = p[1];
> >> +            green.u8 = p[2];
> >> +            blue     = p[3];
> >> +
> >> +            red.u8 = add_comp(red.u8, 
> >> color_transform_delta(green_to_red.s8,  green.s8));
> >> +            blue  +=                  
> >> color_transform_delta(green_to_blue.s8, green.s8);
> >> +            blue   = add_comp(blue,   
> >> color_transform_delta(red_to_blue.s8,   red.s8));
> > 
> > hmm?
> 
> Yeah, it's pretty weird. I'll see if I can break it down further to
> simplify it without breaking the math.

I was mostly pointing to the fact you can use += on all three lines instead of
one. And that you need make_signed() function essentially for
color_transform_delta() arguments instead of alias8 (or even hide the
conversion inside that function since it's not needed anywhere else).

> >> +
> >> +            p[1]  = red.u8;
> >> +            p[3]  = blue;
> >> +        }
> >> +    }
> >> +    return 0;
> >> +}
> >> +
[...]
_______________________________________________
libav-devel mailing list
[email protected]
https://lists.libav.org/mailman/listinfo/libav-devel

Re: [libav-devel] [PATCH 2/2] Add a WebP lossless decoder

Reply via email to