On November 13, 2015 2:35:41 PM GMT+01:00, Senthil Kumar Selvaraj <senthil_kumar.selva...@atmel.com> wrote: > >On Thu, Nov 12, 2015 at 08:37:02PM +0100, Richard Biener wrote: >> On November 12, 2015 8:10:05 PM GMT+01:00, Senthil Kumar Selvaraj ><senthil_kumar.selva...@atmel.com> wrote: >> >Hi, >> > >> > When analyzing code size differences between a 4.9.x compiler and >> > trunk for the AVR target, I found quite a few cases where extra >> > registers were being used to hold smaller types (two 8 bit >registers >> > for a uint8_t, for example). >> > >> >On deeper analysis, I found that the VRP pass (gcc/tree-vrp.c) was >the >> >point >> > at which the dumps start to diverge. >> > >> > For code like this >> > >> >#include <stdint.h> >> > >> >uint16_t get(const uint16_t wvalue) >> >{ >> > const uint8_t type = (wvalue >> 8); >> > uint16_t size = 0; >> > >> > if (type == 1) >> > { >> > size = 20; >> > } >> > else if (type == 2) >> > { >> > size = 32; >> > } >> > return size; >> >} >> > >> > VRP substitutes wvalue for the type variable in the conditionals >> > (simplify_cond_using_ranges:9506), as it figures out that the >range >> > of wvalue is the same as a uint8_t). That is, code goes from >> > >> ><bb 2>: >> >_3 = wvalue_2(D) >> 8; >> >type_4 = (const uint8_t) _3; >> >if (type_4 == 1) >> > goto <bb 5>; >> >else >> > goto <bb 3>; >> > >> >to >> > >> ><bb 2>: >> >_3 = wvalue_2(D) >> 8; >> >type_4 = (const uint8_t) _3; >> >if (_3 == 1) >> > goto <bb 5>; >> >else >> > goto <bb 3>; >> > >> > This "widening" causes later passes to allocate extra registers >> > (holding zeros for the extra bits) and generate extra comparisons >> > for the AVR target. >> > >> > I found that in the 4.9.2 compiler I was benchmarking against, VRP >> > didn't figure out the range for wvalue and therefore the folding >> > didn't happen, which was why the code was better. >> > >> > With the native compiler on my machine (gcc 5.2 x86_64) - >replacing >> > uint8_t by uint32_t and uint16_t by uint64_t, and shifting right >by >> > 32 bits instead of 8 shows the same effect - the generated code >uses >> > rdi instead of just di to hold the type variable. >> > >> > Is this a bug? Should the folding happen only if the type >> > conversion was from a smaller type to a bigger one? Or is the >backend >> > supposed to detect this pattern and deal with it? >> >> We should probably avoid widening beyond word_mode (or sth else if >there is sth suitable). >> > >Hmm, that does fix this problem. The below patch allows folding only if >the operand size is smaller or equal to a word. > >diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c >index e2393e4..3f5668c 100644 >--- a/gcc/tree-vrp.c >+++ b/gcc/tree-vrp.c >@@ -9467,6 +9467,8 @@ simplify_cond_using_ranges (gcond *stmt) > innerop = gimple_assign_rhs1 (def_stmt); > > if (TREE_CODE (innerop) == SSA_NAME >+ && (GET_MODE_SIZE(TYPE_MODE(TREE_TYPE(innerop))) >+ <= GET_MODE_SIZE(word_mode)) > && !POINTER_TYPE_P (TREE_TYPE (innerop))) > { > value_range *vr = get_value_range (innerop); > >However, if the type conversion was from a smaller type to a larger >type, and the >smaller type was bigger than a word, this patch results in worse code. >For example. > >#include <stdint.h> > >uint16_t get(const uint16_t wvalue) >{ > const uint32_t type = (wvalue >> 8); > uint16_t size = 0; > > if (type == 1) > { > size = 20; > } > else if (type == 2) > { > size = 32; > } > return size; >} > >Before the patch, the folding done caused type to be substituted with >wvalue, and the >smaller size resulted in better code. After the patch, since wvalue is >bigger than a >word (for AVR), the folding doesn't happen and registers are allocated >to hold all 32 >bits. > >How does the below patch look? It does the folding only if it is >beneficial i.e., the >value being substituted (innerop) is smaller than or equal the current >one (op0)?
I think widenings up to word-mode are OK. >diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c >index e2393e4..3f5668c 100644 >--- a/gcc/tree-vrp.c >+++ b/gcc/tree-vrp.c >@@ -9467,6 +9467,8 @@ simplify_cond_using_ranges (gcond *stmt) > innerop = gimple_assign_rhs1 (def_stmt); > > if (TREE_CODE (innerop) == SSA_NAME >+ && (GET_MODE_SIZE(TYPE_MODE(TREE_TYPE(innerop))) >+ <= GET_MODE_SIZE(TYPE_MODE(TREE_TYPE(op0)))) > && !POINTER_TYPE_P (TREE_TYPE (innerop))) > { > value_range *vr = get_value_range (innerop); > > >Regards >Senthil > >> Richard. >> >> >Regards >> >Senthil >> > >> > >> >details-raw vrp1 dump >> > >> >;; Function get (get, funcdef_no=0, decl_uid=1522, cgraph_uid=0, >> >symbol_order=0) >> > >> >;; 1 loops found >> >;; >> >;; Loop 0 >> >;; header 0, latch 1 >> >;; depth 0, outer -1 >> >;; nodes: 0 1 2 3 4 5 >> >;; 2 succs { 5 3 } >> >;; 3 succs { 4 5 } >> >;; 4 succs { 5 } >> >;; 5 succs { 1 } >> > >> >ASSERT_EXPRs to be inserted >> > >> >Assertions to be inserted for type_4 >> > if (type_4 == 1) >> > >> > BB #3 >> > EDGE 2->3 2 [55.9%] (FALSE_VALUE,EXECUTABLE) >> > PREDICATE: type_4 ne_expr 1 >> > >> > >> > >> > >> >Updating SSA: >> >Registering new PHI nodes in block #2 >> >Updating SSA information for statement type_4 = (const uint8_t) _3; >> >Updating SSA information for statement if (type_4 == 1) >> >Registering new PHI nodes in block #3 >> >Updating SSA information for statement type_6 = ASSERT_EXPR <type_4, >> >type_4 != 1>; >> >Updating SSA information for statement if (type_4 == 2) >> >Registering new PHI nodes in block #4 >> >Registering new PHI nodes in block #5 >> > >> >SSA replacement table >> >N_i -> { O_1 ... O_j } means that N_i replaces O_1, ..., O_j >> > >> >type_6 -> { type_4 } >> >Incremental SSA update started at block: 2 >> >Number of blocks in CFG: 6 >> >Number of blocks to update: 2 ( 33%) >> >Affected blocks: 2 3 >> > >> > >> > >> >SSA form after inserting ASSERT_EXPRs >> >get (const uint16_t wvalue, const uint8_t windex, const void * * >const >> >address) >> >{ >> > uint16_t size; >> > const uint8_t type; >> > unsigned int _3; >> > >> > <bb 2>: >> > _3 = wvalue_2(D) >> 8; >> > type_4 = (const uint8_t) _3; >> > if (type_4 == 1) >> > goto <bb 5>; >> > else >> > goto <bb 3>; >> > >> > <bb 3>: >> > type_6 = ASSERT_EXPR <type_4, type_4 != 1>; >> > if (type_6 == 2) >> > goto <bb 4>; >> > else >> > goto <bb 5>; >> > >> > <bb 4>: >> > >> > <bb 5>: >> > # size_1 = PHI <20(2), 0(3), 32(4)> >> > return size_1; >> > >> >} >> > >> > >> >Immediate_uses: >> > >> >size_1 : --> single use. >> >return size_1; >> > >> >wvalue_2(D) : --> single use. >> >_3 = wvalue_2(D) >> 8; >> > >> >_3 : --> single use. >> >type_4 = (const uint8_t) _3; >> > >> >type_4 : -->3 uses. >> >type_6 = ASSERT_EXPR <type_4, type_4 != 1>; >> >type_6 = ASSERT_EXPR <type_4, type_4 != 1>; >> >if (type_4 == 1) >> > >> >.MEM_5(D) : --> single use. >> ># VUSE <.MEM_5(D)> >> >return size_1; >> > >> >type_6 : --> single use. >> >if (type_6 == 2) >> > >> >Adding destination of edge (0 -> 2) to worklist >> > >> >Simulating block 2 >> > >> >Visiting statement: >> >_3 = wvalue_2(D) >> 8; >> >Intersecting >> > [0, 255] >> >and >> > [0, +INF] >> >to >> > [0, 255] >> >Found new range for _3: [0, 255] >> >interesting_ssa_edges: adding SSA use in type_4 = (const uint8_t) >_3; >> >marking stmt to be not simulated again >> > >> >Visiting statement: >> >type_4 = (const uint8_t) _3; >> >Found new range for type_4: [0, +INF] >> >interesting_ssa_edges: adding SSA use in type_6 = ASSERT_EXPR ><type_4, >> >type_4 != 1>; >> >interesting_ssa_edges: adding SSA use in if (type_4 == 1) >> >marking stmt to be not simulated again >> > >> >Visiting statement: >> >if (type_4 == 1) >> > >> >Visiting conditional with predicate: if (type_4 == 1) >> > >> >With known ranges >> > type_4: [0, +INF] >> > >> >Predicate evaluates to: DON'T KNOW >> >Adding destination of edge (2 -> 5) to worklist >> >Adding destination of edge (2 -> 3) to worklist >> > >> >Simulating block 3 >> > >> >Visiting statement: >> >type_6 = ASSERT_EXPR <type_4, type_4 != 1>; >> >Intersecting >> > ~[1, 1] EQUIVALENCES: { type_4 } (1 elements) >> >and >> > [0, +INF] >> >to >> > ~[1, 1] EQUIVALENCES: { type_4 } (1 elements) >> >Found new range for type_6: ~[1, 1] >> >interesting_ssa_edges: adding SSA use in if (type_6 == 2) >> >marking stmt to be not simulated again >> > >> >Visiting statement: >> >if (type_6 == 2) >> > >> >Visiting conditional with predicate: if (type_6 == 2) >> > >> >With known ranges >> > type_6: ~[1, 1] EQUIVALENCES: { type_4 } (1 elements) >> > >> >Predicate evaluates to: DON'T KNOW >> >Adding destination of edge (3 -> 4) to worklist >> > >> >Simulating block 4 >> > >> >Simulating block 5 >> > >> >Visiting PHI node: size_1 = PHI <20(2), 0(3), 32(4)> >> > Argument #0 (2 -> 5 executable) >> > 20: [20, 20] >> > Argument #1 (3 -> 5 executable) >> > 0: [0, 0] >> >Meeting >> > [20, 20] >> >and >> > [0, 0] >> >to >> > [0, 20] >> > Argument #2 (4 -> 5 executable) >> > 32: [32, 32] >> >Meeting >> > [0, 20] >> >and >> > [32, 32] >> >to >> > [0, 32] >> >Intersecting >> > [0, 32] >> >and >> > [0, +INF] >> >to >> > [0, 32] >> >Found new range for size_1: [0, 32] >> >interesting_ssa_edges: adding SSA use in return size_1; >> >marking stmt to be not simulated again >> > >> >Visiting statement: >> >return size_1; >> > >> >Value ranges after VRP: >> > >> >size_1: [0, 32] >> >wvalue_2(D): VARYING >> >_3: [0, 255] >> >type_4: [0, +INF] >> >type_6: ~[1, 1] EQUIVALENCES: { type_4 } (1 elements) >> > >> > >> > >> >Substituting values and folding statements >> > >> >Folding statement: _3 = wvalue_2(D) >> 8; >> >Not folded >> >Folding statement: type_4 = (const uint8_t) _3; >> >Not folded >> >Folding statement: if (type_4 == 1) >> >Folded into: if (_3 == 1) >> > >> >Folding statement: if (type_6 == 2) >> >Not folded >> >Folding PHI node: size_1 = PHI <20(2), 0(3), 32(4)> >> >No folding possible >> >Folding statement: return size_1; >> >Not folded >> >get (const uint16_t wvalue, const uint8_t windex, const void * * >const >> >address) >> >{ >> > uint16_t size; >> > const uint8_t type; >> > unsigned int _3; >> > >> > <bb 2>: >> > _3 = wvalue_2(D) >> 8; >> > type_4 = (const uint8_t) _3; >> > if (_3 == 1) >> > goto <bb 5>; >> > else >> > goto <bb 3>; >> > >> > <bb 3>: >> > if (type_4 == 2) >> > goto <bb 4>; >> > else >> > goto <bb 5>; >> > >> > <bb 4>: >> > >> > <bb 5>: >> > # size_1 = PHI <20(2), 0(3), 32(4)> >> > return size_1; >> > >> >} >> >>