On 4 August 2016 at 13:31, Richard Biener <rguent...@suse.de> wrote:
> On Thu, 4 Aug 2016, Prathamesh Kulkarni wrote:
>
>> Hi,
>> This is a prototype patch for propagating known/unknown bits 
>> inter-procedurally.
>> for integral types which propagates info obtained from get_nonzero_bits ().
>>
>> Patch required making following changes:
>> a) To make info from get_nonzero_bits() available to ipa, I had to remove
>> guard !nonzero_p in ccp_finalize. However that triggered the following ICE
>> in get_ptr_info() for default_none.f95 (and several other fortran tests)
>> with options: -fopenacc -O2
>> ICE: http://pastebin.com/KjD7HMQi
>> I confirmed with Richard that this was a latent issue.
>
> Can you plase bootstrap/test the fix for this separately?  (doesn't
> seem to be included in this patch btw)
Well I don't have the fix available -;)
>
>> b) I chose widest_int for representing value, mask in ipcp_bits_lattice
>> and correspondingly changed declarations for
>> bit_value_unop_1/bit_value_binop_1 to take
>> precision and sign instead of type (those are the only two fields that
>> were used). Both these functions are exported by tree-ssa-ccp.h
>> I hope that's ok ?
>
> That's ok, but please change the functions to overloads of
> bit_value_binop / bit_value_unop to not export ugly _1 names.
>
> -  signop sgn = TYPE_SIGN (type);
> -  int width = TYPE_PRECISION (type);
> +  signop sgn = type_sgn;
> +  int width = (int) type_precision;
>
> please adjust parameter names to get rid of those now unnecessary
> locals (and make the precision parameter an 'int').
>
>> c) Changed streamer_read_wi/streamer_write_wi to non-static.
>> Ah I see Kugan has submitted a patch for this, so I will drop this hunk.
>
> But he streams wide_int, not widest_int.  I followed up on his
> patch.
Oops, I got confused, sorry about that.
>
>> d) We have following in tree-ssa-ccp.c:get_default_value ():
>>           if (flag_tree_bit_ccp)
>>             {
>>               wide_int nonzero_bits = get_nonzero_bits (var);
>>               if (nonzero_bits != -1)
>>                 {
>>                   val.lattice_val = CONSTANT;
>>                   val.value = build_zero_cst (TREE_TYPE (var));
>>                   val.mask = extend_mask (nonzero_bits);
>>                 }
>>
>> extend_mask() sets all upper bits to 1 in nonzero_bits, ie, varying
>> in terms of bit-ccp.
>> I suppose in tree-ccp we need to extend mask if var is parameter since we 
>> don't
>> know in advance what values it will receive from different callers and mark 
>> all
>> upper bits as 1 to be safe.
>
> Not sure, it seems to me that we can zero-extend for unsigned types
> and sign-extend for signed types (if the "sign"-bit of nonzero_bits
> is one it properly makes higher bits undefined).  Can you change
> the code accordingly?  (simply give extend_mask a sign-op and use
> that appropriately?)  Please split out this change so it can be
> tested separately.
>
>> However I suppose with ipa, we can determine exactly which bits of
>> parameter are constant and
>> setting all upper bits to 1 will become unnecessary ?
>>
>> For example, consider following artificial test-case:
>> int f(int x)
>> {
>>   if (x > 300)
>>     return 1;
>>   else
>>     return 2;
>> }
>>
>> int main(int argc, char **argv)
>> {
>>   return f(argc & 0xc) + f (argc & 0x3);
>> }
>>
>> For x, the mask would be meet of:
>> <0, 0xc> meet <0, 0x3> == (0x3 | 0xc) | (0 ^ 0) == 0xf
>> and ipcp_update_bits() sets nonzero_bits for x to 0xf.
>> However get_default_value then calls extend_mask (0xf), resulting in
>> all upper bits
>> being set to 1 and consequently the condition if (x > 300) doesn't get 
>> folded.
>
> But then why would the code trying to optimize the comparison look at
> bits that are outside of the precision?  (where do we try to use this
> info?  I see that VRP misses to use nonzero bits if no range info
> is present - I suppose set_nonzero_bits misses to eventually adjust
> the range.
>
> That said, where is the folding code and why does it care for those
> "uninteresting" bits at all?
Well there is following in bit_value_binop_1 for case LT_EXPR / LE_EXPR:
        /* If the most significant bits are not known we know nothing.  */
        if (wi::neg_p (o1mask) || wi::neg_p (o2mask))
          break;

IIUC extend_mask extends all upper bits to 1, and we hit break and
thus not perform folding.
ccp2 dump shows:
Folding statement: if (x_2(D) > 300)
which is likely CONSTANT
Not folded

Instead if we extend based on signop, then the condition gets folded correctly:
Folding statement: if (x_2(D) > 300)
which is likely CONSTANT
Folding predicate x_2(D) > 300 to 0
gimple_simplified to if (0 != 0)
Folded into: if (0 != 0)

I thought it was unsafe for ccp to extend based on sign-op,
so I guarded that on DECL_SET_BY_IPA.
I will try to change extend_mask to extend the mask based on signop
and get rid of the flag.

I will address your other comments in follow-up patch.

Thanks,
Prathamesh
>
>> To resolve this, I added a new flag "set_by_ipa" to decl_common,
>> which is set to true if the mask of parameter is determined by ipa-cp,
>> and the condition changes to:
>>
>> if (SSA_NAME_VAR (var)
>>     && TREE_CODE (SSA_NAME_VAR (var)) == PARM_DECL
>>     && DECL_SET_BY_IPA (SSA_NAME_VAR (var))
>>   val.mask = widest_int::from (nonzero_bits,
>>                           TYPE_SIGN (TREE_TYPE (SSA_NAME_VAR (var)));
>> else
>>   val.mask = extend_mask (nonzero_bits);
>>
>> I am not sure if adding a new flag to decl_common is a good idea. How
>> do other ipa passes deal with this/similar issue ?
>>
>> I suppose we would want to gate this on some flag, say -fipa-bit-cp ?
>> I haven't yet gated it on the flag, will do in next version of patch.
>> I have added some very simple test-cases, I will try to add more
>> meaningful ones.
>
> See above - we should avoid needing this.
>
>> Patch passes bootstrap+test on x86_64-unknown-linux-gnu
>> and cross-tested on arm*-*-* and aarch64*-*-* with the exception
>> of some fortran tests failing due to above ICE.
>>
>> As next steps, I am planning to extend it to handle alignment propagation,
>> and do further testing (lto-bootstrap, chromium).
>> I would be grateful for feedback on the current patch.
>
> I see you do
>
> @@ -895,7 +899,7 @@ do_dbg_cnt (void)
>     Return TRUE when something was optimized.  */
>
>  static bool
> -ccp_finalize (bool nonzero_p)
> +ccp_finalize (bool nonzero_p ATTRIBUTE_UNUSED)
>  {
>    bool something_changed;
>    unsigned i;
> @@ -913,10 +917,7 @@ ccp_finalize (bool nonzero_p)
>
>        if (!name
>           || (!POINTER_TYPE_P (TREE_TYPE (name))
> -             && (!INTEGRAL_TYPE_P (TREE_TYPE (name))
> -                 /* Don't record nonzero bits before IPA to avoid
> -                    using too much memory.  */
> -                 || !nonzero_p)))
> +             && (!INTEGRAL_TYPE_P (TREE_TYPE (name)))))
>         continue;
>
> can you instead adjust the caller to do sth like
>
>   if (ccp_finalize (nonzero_p || flag_ipa_cp))
>     {
>
> ?  What we miss to optimize memory usage in the early CCP case
> (it's run very early, before dead code elimination) is to
> avoid setting alignment / nonzero bits for the case of
> fully propagatable (and thus dead after substitute_and_fold)
> SSA names.
>
> So in ccp_finalize do sth like
>
>       val = get_value (name);
>       if (val->lattice_val != CONSTANT
>           || TREE_CODE (val->value) != INTEGER_CST
>           || val->mask == 0)
>         continue;
>
> That should cut down early CCP memory use in case of nonzero
> setting significantly.
>
> I didn't look at the propagation part but eventually the IPA-CP
> lattice gets quite big.  Also the alignment lattice is very
> similar to the bits lattice so why not merge those two?  But
> in the end it's Martins/Honzas call here.  Note there is
> trailing_wide_ints <> which could be used to improve memory usage
> based on the underlying type.
>
> Thanks,
> Richard.

Reply via email to