Hi, Sorry for the slow review.
> The new version v7 is attached which has addressed your review comments > on v6. Could you have a further look? Many thanks in advance! > > Bootstrapped/regtested on aarch64-linux-gnu and powerpc64le-linux-gnu P9. > Even with explicit vect-partial-vector-usage settings 1/2 on Power target, > I didn't find any remarkable failures (only some trivial test case issues). Thanks, this looks great. OK for trunk with the minor nits below fixed. "Kewen.Lin" <li...@linux.ibm.com> writes: > @@ -968,4 +968,8 @@ Bound on number of runtime checks inserted by the > vectorizer's loop versioning f > Common Joined UInteger Var(param_vect_max_version_for_alignment_checks) > Init(6) Param Optimization > Bound on number of runtime checks inserted by the vectorizer's loop > versioning for alignment check. > > +-param=vect-partial-vector-usage= > +Common Joined UInteger Var(param_vect_partial_vector_usage) Init(2) > IntegerRange(0, 2) Param Optimization > +Controls how loop vectorizer uses partial vectors. 0 means never, 1 means > only for loops whose iterating need can be removed, 2 means for all loops. > The default value is 2. IMO reads better as s/iterating need/need to iterate/ > + FOR_EACH_MODE_IN_CLASS (tmode_iter, MODE_INT) > + { > + scalar_mode tmode = tmode_iter.require (); > + unsigned int tbits = GET_MODE_BITSIZE (tmode); > + > + /* ??? Do we really want to construct one IV whose precision exceeds > + BITS_PER_WORD? */ > + if (tbits > BITS_PER_WORD) > + break; > + > + /* Find the first available standard integral type. */ > + if (tbits >= min_ni_prec && targetm.scalar_mode_supported_p (tmode)) > + { > + iv_type = build_nonstandard_integer_type (tbits, true); > + break; > + } > + } The outer {…} block should be indented by two spaces relative to the FOR_EACH_MODE_IN_CLASS. > + > + if (!iv_type) > + { > + if (dump_enabled_p ()) > + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > + "can't vectorize with length-based partial vectors" > + " due to no suitable iv type.\n"); IMO reads better as s/due to/because there is/ > + /* Shouldn't go with length-based approach if fully masked. */ > + gcc_assert (!loop_lens || (loop_lens && !loop_masks)); The “loop_lens &&” is redundant. Same for vectorizable_load. > @@ -7994,6 +8030,42 @@ vectorizable_store (vec_info *vinfo, > vect_finish_stmt_generation (vinfo, stmt_info, call, gsi); > new_stmt = call; > } > + else if (loop_lens) > + { > + tree final_len > + = vect_get_loop_len (loop_vinfo, loop_lens, > + vec_num * ncopies, vec_num * j + i); > + align = least_bit_hwi (misalign | align); > + tree ptr = build_int_cst (ref_type, align); > + machine_mode vmode = TYPE_MODE (vectype); > + opt_machine_mode new_ovmode > + = get_len_load_store_mode (vmode, false); > + gcc_assert (new_ovmode.exists ()); > + machine_mode new_vmode = new_ovmode.require (); The assert is redundant with the “require ()”. Same for vectorizable_load. Thanks, Richard