On Mon, Aug 29, 2011 at 9:44 PM, Richard Guenther <rguent...@suse.de> wrote:
>> > This patch makes a conversion optab from the direct optabs vcond >> > and vcondu. This allows to specify different modes for the >> > actual comparison and the value that is selected. >> > >> > All targets but i386 are trivially converted by >> > s/vcond<mode>/vcond<mode><mode>/. The i386 port is enhanced >> > to support a OP b ? c : d as ({ mask = a OP b; (c & mask) | (d & ~mask); >> > }), constraining it to what the middle-end constrained itself to >> > (matching number of vector elements in the comparison operands with >> > the result vector types) would explode patterns too much. >> > Thus, only a subset of mode combinations will be excercised >> > (but none at the moment - a followup will fix the vectorizer, >> > and generic vectors from the C extensions have a patch pending). >> > >> > Bootstrapped on x86_64-unknown-linux-gnu, tests are currently >> > running for {,-m32}. >> > >> > Ok if that succeeds? >> > >> > Thanks, >> > Richard. >> > >> > 2011-08-29 Richard Guenther <rguent...@suse.de> >> > >> > * genopinit.c (optabs): Turn vcond{,u}_optab into a conversion >> > optab with two modes. >> > * optabs.h (enum convert_optab_index): Add COI_vcond, COI_vcondu. >> > (enum direct_optab_index): Remove DOI_vcond, DOI_vcondu. >> > (vcond_optab): Adjust. >> > (vcondu_optab): Likewise. >> > (expand_vec_cond_expr_p): Adjust prototype. >> > * optabs.c (get_vcond_icode): Adjust. >> > (expand_vec_cond_expr_p): Likewise. >> > (expand_vec_cond_expr): Likewise. >> > * tree-vect-stmt.c (vectorizable_condition): Adjust. >> > >> > * config/i386/sse.md (vcond<mode>): Split to >> > vcond<V_256:mode><VF_256:mode>, vcond<V_128:mode><VF_128:mode>, >> > vcond<V_128:mode><VI124_128:mode> and >> > vcondu<V_128:mode><VI124_128:mode>. >> > (vcondv2di): Change to vcond<VI8F_128:mode>v2di. >> > (vconduv2di): Likewise. >> > * config/arm/neon.md (vcond<mode>): Change to vcond*<mode><mode>. >> > (vcondu<mode>): Likewise. >> > * config/ia64/vect.md (vcond<mode>): Likewise. >> > (vcondu<mode>): Likewise. >> > (vcondv2sf): Likewise. >> > * config/mips/mips-ps-3d.md (vcondv2sf): Likewise. >> > * config/rs6000/paired.md (vcondv2sf): Likewise. >> > * config/rs6000/vector.md (vcond<mode>): Likewise. >> > (vcondu<mode>): Likewise. >> > * config/spu/spu.md (vcond<mode>): Likewise. >> > (vcondu<mode>): Likewise. >> >> Do we really want to introduce stuff like: >> >> ! (define_expand "vcond<V_128:mode><VF_128:mode>" >> >> You are in fact introducing 6x2 = 12 patterns, many of them (i.e. >> v16qiv2df combination) invalid. > > Well, in principle they are not "invalid" they would be a short-hand - > the example of (subreg:V16QI (vcond:V2DI (... )). > >> I'd prefer a pattern with mode-less operands 4 and 5, rejected in insn >> constraints for invalid combinations: > > Hm, ok - that was the first variant I tried (well, but with modeless > operands 1 and 2, to keep 4 and 5 selcting int vs. fp compare). But > modeless operands get you that annoying warning from the gen* programs. Only for define_insn, if your c_test does not include string "operands". > How'd you ask if a pattern is available for vcondv4si with v4sf > operands 4 and 5? The vectorizer would need to be able to ask this > question. Maybe with something along the lines of: (define_expand "vcond<mode>" [(set (match_operand:VI124_128 0 "register_operand" "") (if_then_else:VI124_128 (match_operator 3 "" [(match_operand 4 "nonimmediate_operand" "") (match_operand 5 "nonimmediate_operand" "")]) (match_operand:VI124_128 1 "general_operand" "") (match_operand:VI124_128 2 "general_operand" "")))] "TARGET_SSE2" { if (GET_MODE (operands[4]) != GET_MODE (operands[5]) || (GET_MODE_NUNITS (GET_MODE (operands[4])) != GET_MODE_NUNITS (GET_MODE (operands[0])))) FAIL; bool ok = ix86_expand_int_vcond (operands); gcc_assert (ok); DONE; }) This means that vcond pattern is allowed to FAIL, so when vectorizer tentatively tries to expand the pattern, FAIL signalizes that operand modes are not supported. > In the end putting both mask generation and apply into one > instruction pattern causes all this issues - but it helps > not exposing the mask representation of the HW. No, but we should be sure that the widths are the same... Uros.