Re: Question about Temporary Outputs

2011-03-31 Thread Pranav Bhandarkar
On Thu, Mar 31, 2011 at 4:04 PM, Iyer, Balaji V balaji.v.i...@intel.com wrote:
 Hello Everyone,
               I see in GCC that when we use the flag -f-tree-optimized it 
 will dump the contents of the input file after doing all the tree-based 
 optimization. Is it possible for me to modify this file and then submit it 
 back into gcc for processing to create an executable/assembly dump?

It has been a while since I have worked on GCC - back then (a couple
of years ago) this was not possible. I do not have reason to believe
this would have changed.

Pranav


Re: implementing load 8 byte instruction

2010-03-19 Thread Pranav Bhandarkar
On Thu, Mar 18, 2010 at 10:29 AM, roy rosen roy.1ro...@gmail.com wrote:
 Hi,

 I am trying to implement a simple load 8 bytes instruction.
 I tried to use movdi so that it would allocate two sequential
 registers for the load.
 It starts well but in pass subreg1 the insns are decomposed and all DI
 operands are replaced with SI.

 I understand that this is a desireable optimzation but then the load
 is done using two load 4 bytes instructions.

 Does anybody has any idea what should I do?


Could it be a problem with the constraints in your movdi define_insn ?

Pranav


Dead Store Elimination

2009-10-22 Thread Pranav Bhandarkar
Hi,

A possible silly question about the dead store elimination pass. From
the documentation it is clear that the store S1 below is removed by
this pass (in dse.c)

*(addr) = value1;  // S1
.
.
*(addr) = value2  // S2 .. No read of addr between S1 and S2.
..
 = *(addr)   // Load
...
end_of_the_function

However, consider a different example.

*(addr) = value1;  // S1
..
.
end_of_the_function.

i.e. there is no store Sn that follows S1 along any path from S1 to
the end of the function and there is no read of addr following S1
either. Is the dse pass expected to remove such stores ? (I am
inclined to think that it should, but I am seeing a case where dse
doesnt remove such stores) . Further is the behaviour expected to be
different if the addr is based on  fp ?

TIA,
Pranav


Re: Dead Store Elimination

2009-10-22 Thread Pranav Bhandarkar
 Are you talking about the tree dead-store elimination pass or
 the RTL one?  Basically *addr = value1; cannot be removed
 if addr does not point to local memory or if the pointed-to
 memory escapes through a call-site that is dominated by this store.

I am talking about the RTL dead-store elimination. In my case addr is
based on the stack pointer and the store is to a local variable on the
stack.

Thanks,
Pranav


Re: COMPONENT_REF problem ?

2009-10-06 Thread Pranav Bhandarkar
 Look at

 2009-07-14  Richard Guenther  rguent...@suse.de
            Andrey Belevantsev a...@ispras.ru

        * tree-ssa-alias.h (refs_may_alias_p_1): Declare.
        (pt_solution_set): Likewise.
        * tree-ssa-alias.c (refs_may_alias_p_1): Export.
        * tree-ssa-structalias.c (pt_solution_set): New function.
        * final.c (rest_of_clean_state): Free SSA data structures.
 ...
        * emit-rtl.c (component_ref_for_mem_expr): Remove.
        (mem_expr_equal_p): Use operand_equal_p.
        (set_mem_attributes_minus_bitpos): Do not use
        component_ref_for_mem_expr.
 ...

 this change.

 Richard.

Great. Thanks.

Pranav


Re: COMPONENT_REF problem ?

2009-10-05 Thread Pranav Bhandarkar
Richard,

 If you are not working on trunk this can happen because the way
 MEM_EXPRs are canonicalized.

Thanks. Yes, I am not on trunk and may not be able to move right away.
I would appreciate some pointers about where I should look, If I want
to fix this ?

Thanks,
Pranav


COMPONENT_REF problem ?

2009-10-03 Thread Pranav Bhandarkar
Hi,

Is it possible for a component_ref node to have its arg 0 to be NULL ?
I would think not because from tree.def I gather that arg 0 tells me
what structures field this component_ref refers to. For convenience, I
have pasted here what tree.def tells me about a component_ref


/* Value is structure or union component.
   Operand 0 is the structure or union (an expression).
   Operand 1 is the field (a node of type FIELD_DECL).
   Operand 2, if present, is the value of DECL_FIELD_OFFSET, measured
   in units of DECL_OFFSET_ALIGN / BITS_PER_UNIT.  */
DEFTREECODE (COMPONENT_REF, component_ref, tcc_reference, 3)


I am working on a target hook wherein I use the MEM_EXPR of a mem rtx.
It returns the following component_ref node

component_ref 0x2b3599d0ca00
type integer_type 0x2b35998f1e40 int32 sizes-gimplified public SI
size integer_cst 0x2b359975a840 constant 32
unit size integer_cst 0x2b359975a4b0 constant 4
align 32 symtab 0 alias set -1 canonical type 0x2b359976d6c0
precision 32 min integer_cst 0x2b359975a8a0 -2147483648 max \
integer_cst 0x2b359975a8d0 2147483647
pointer_to_this pointer_type 0x2b3599b1c300

arg 1 field_decl 0x2b3599b18140 dcfMemL1 type integer_type
0x2b35998f1e40 int32
used nonlocal decl_3 SI file ../src/synth_core/QDSP6/svrreg.h
line 134 col 11 size integer_cst 0x2b359975a840 32 unit siz\
e integer_cst 0x2b359975a4b0 4
align 32 offset_align 64
offset integer_cst 0x2b3599b04b10 constant 152 bit offset
integer_cst 0x2b359975a840 32 context record_type 0x2b3599b1\
2840 dlsSvrReg
chain field_decl 0x2b3599b181e0 dcfMemL2 type integer_type
0x2b35998f1e40 int32
used nonlocal decl_3 SI file
../src/synth_core/QDSP6/svrreg.h line 135 col 11 size integer_cst
0x2b359975a840 32 unit\
 size integer_cst 0x2b359975a4b0 4
align 32 offset_align 64
offset integer_cst 0x2b3599b04b70 constant 160
bit offset integer_cst 0x2b359977e0c0 constant 0 context
record_type 0x2b3599b12840 dlsSvrReg chain field_decl 0x2\
b3599b18280 dcfMemR1


Following this if I do
exp = TREE_OPERAND ( comp_ref_node, 0);

I get exp as NULL ?

Is this possible ? or Is there something I am doing wrong ? or There
is something fishy here with the tree node that MEM_EXPR is giving me
?

Thanks,
Pranav


Re: Pipeline hazards and delay slots

2008-05-05 Thread Pranav Bhandarkar
Hi Mohammed,

  But how can i handle instances like this? Should i be doing insertion
  of nops in reorg pass?

FWIW, I had worked on a port for VLIW processor about three years back
and IIRC we had used the reorg pass for inserting the nops.  I think
if you look at the scheduler dumps  you will notice that the scheduler
would have, in all likelihood, accounted for the delay of 1 cycle
between the lw and the add instructions. Only that you will have
to put the nop yourself between these two instructions.

cheers!
Pranav


Common Subexpression Elimination Opportunity not being exploited

2008-05-02 Thread Pranav Bhandarkar
Hi,

I have a case where the code looks roughly like
foo = i1 op i2;
if (test1) bar1 = i1 op i2;
if (test2) bar2 = i1 op i2;

This can get converted into
reg = i1 op i2
foo = reg
if (test1) bar1 = reg
if (test2) bar2 = reg

GCC 4.3 does fine here except when the operator is logical and (see
attached. test.c uses logical and and test1.c uses plus)

It seems that shortcut evaluation of the logical and throws off
common subexpression elimination. i.e the code generated is like
foo = (i1 != 0 )  (i2 != 0)
if (test1)
{
if (i1 != 0)
then lab2:
else lab 3:
lab2:
if (i2 != 0)
then lab 4
lab 3:
temp = 0;
lab 4 :
temp = 1
bar1 = temp
}

if (test2)
{

same as the body of the above if condition.
}

The entire body of the two if stmts can be pulled out but it isnt.

This kind of cse works when the operator is an arithmetic operator like +.

Am I missing something here ? Is there something I can do to improve
the case with the logical and
May be this is difficult on hardware that uses the CC register ? But I
am working on a private port that doesnt use the CC register.

Thanks,
Pranav
short ione;
short itwo;
short ithree;
short ifour;
short ifive;
int one_array[50];
short a[50];


void foo ()
{
  ione = gen_short(1);
  itwo = gen_short(1);
  ithree = gen_short(1);
  ifour = gen_short(1);
  ifive = gen_short(1);


  a[0] = ione  itwo  ithree  ifour  ifive;

  if (one_array[1])
a[1] = ione  itwo  ithree  ifour  ifive;
  if (one_array[2])
a[2] = ione  itwo  ithree  ifour  ifive;

  return;
}
short ione;
short itwo;
short ithree;
short ifour;
short ifive;
int one_array[50];
short a[50];


void foo ()
{
  ione = gen_short(1);
  itwo = gen_short(1);
  ithree = gen_short(1);
  ifour = gen_short(1);
  ifive = gen_short(1);


  a[0] = ione + itwo + ithree + ifour + ifive;

  if (one_array[1])
a[1] = ione + itwo + ithree + ifour + ifive;
  if (one_array[2])
a[2] = ione + itwo + ithree + ifour + ifive;

  return;
}


Re: Problem with Fix for PR 35163 ?

2008-04-09 Thread Pranav Bhandarkar
  Btw, I have a fix.

oh gr8. I just saw your post on the gcc-patches. Do you still want me
to add this to PR35163 for the record ?

Cheers!
Pranav


Problem with Fix for PR 35163 ?

2008-04-08 Thread Pranav Bhandarkar
Hi,

Consider the attached testcase.

Working on a private port (Infact I see this problem on
arm-none-eabi-gcc too). I see the following in test.c.003t.original

fail = (short int) usi = ssi;

And then in test.c.025t.ssa
 usi.2_5 = (short int) usi_4;
 fail.3_6 = usi.2_5 = ssi_2;

Now ccp1 does constant propagation and we are left with
usi.2_5 = -256;

This causes the test to fail.

Clearly the problem seems to be that since usi is unsigned short int a
short int cant represent all the possible values of usi

I reverted the following patch and the test passed.
 PR middle-end/35163
* fold-const.c (fold_widened_comparison): Use get_unwidened in
value-preserving mode.  Disallow final truncation.

Now with the patch reverted, test.c.003t.original has

 fail = (int) ssi = (int) usi;

And this problem vanished.


Am I missing something here ?

Thanks,
Pranav
int fail;
short fs2(void)
{
return 126;
}

unsigned short ufs1(void)
{
return 65280;
}
int main ()
{

short  ssi;
unsigned short  usi;

ssi = fs2();
usi = ufs1();
fail = !(ssi  usi);

if (fail)
	printf (Failed\n);
else
	printf (Successful\n);

return 0;

}


The effects of closed loop SSA and Scalar Evolution Const Prop.

2008-03-11 Thread Pranav Bhandarkar
Hi,

I am writing about a problem I noticed with the code generated for
memcpy by arm-none-eabi-gcc.

Now, memcpy has three distinct loops - one that copies (4 *sizeof
(long) ) bytes per iteration, one that copies sizeof (long) bytes per
iteration and the last one that copies one byte per iteration. The
registers used for the src and destination pointers should, IMHO, be
the same across all the three loops. However, what I noticed is that
after the first loop the src and dest registers arent reused but the
address of the next byte to be copied is recalculated as (original src
address + (number of iterations of 1st loop * 16)) . Similarly for the
destination address too. See the assembly snippet below.

.L3:
   Code here for the loop that copies 16 bytes per iteration. The
src pointer is ip and the dest pointer is r1. The len is in r0
bhi .L3 @,
sub r2, r4, #16 @ D.1312, len0,
mov r3, r2, lsr #4  @ D.1314, D.1312,
sub r1, r2, r3, asl #4  @ len, D.1312, D.1314,
add r3, r3, #1  @ tmp225, D.1314,
mov r3, r3, asl #4  @ D.1320, tmp225,
cmp r1, #3  @ len,
add r4, r5, r3  @ aligned_src.60, src0, D.1320
Recalcuation of the src pointer for the second loop.
add r0, r6, r3  @ aligned_dst, dst0, D.1320
--- Recalculation of the dest pointer for the second loop.
bls .L4 @,
mov ip, #0  @ ivtmp.31,
.L5:
ldr r3, [r4, ip]@ tmp226,* ivtmp.31
str r3, [r0, ip]@ tmp226,* ivtmp.31
add ip, ip, #4  @ ivtmp.31, ivtmp.31,
rsb r3, ip, r1  @ tmp227, ivtmp.31, len
cmp r3, #3  @ tmp227,
bhi .L5 @,

What seems to happen is that closed loop SSA demands that no variable
be used outside the loop it is defined in . And before the loop
optimization initialization pass, the second loop finds the following
PHI nodes in its first basic block.

  # len_74 = PHI len_38(17), len_15(15)
  # aligned_src_73 = PHI aligned_src_35(17), aligned_src_22(15)
  # aligned_dst_61 = PHI aligned_dst_34(17), aligned_dst_21(15)

All of  len_38, aligned_src_35 and aligned_dst_34 are defined in the
first loop and used here again the second loop. Therefore the loop
initialization pass puts the following PHI nodes in the exit block of
the 1st loop.
  # len_59 = PHI len_38(5)
  # aligned_src_62 = PHI aligned_src_35(5)
  # aligned_dst_16 = PHI aligned_dst_34(5)

and the PHI nodes in the loop 2 are changed into

  # len_74 = PHI len_59(17), len_15(15)
  # aligned_src_73 = PHI aligned_src_62(17), aligned_src_22(15)
  # aligned_dst_61 = PHI aligned_dst_16(17), aligned_dst_21(15)


Now tree scalar evolution goes over PHI nodes and realises that
aligned_src_35 has a scalar evolution {aligned_src_22 + 16, +, 16}_1)
where aligned_src_22 is
(const long int *) src0_12(D) i.e the original src pointer.  Therefore
to calculate aligned_src_62 before the second loop computations are
introduced based on aligned_src_22.

My question is, shouldnt scalar evolution ignore PHI nodes with one
argument (implying a copy) or If not atleast pay heed to the cost of
additional computations introduced.

cheers!
Pranav


Re: VLIW scheduling and delayed branch

2007-12-10 Thread Pranav Bhandarkar
On Dec 9, 2007 2:19 AM, Thomas Sailer [EMAIL PROTECTED] wrote:
  Has anyone faced a similar problem before? Are there targets for which
  both VLIW and DBR are enabled? Perhaps ia64?


Ok, this was a long time back, but Yes I have faced a similar problem.
We disabled
delayed branch scheduling and used the machdep reorg pass. We examined
the dependencies of the
branch instructions moving backwards from the branch instruction and
marking all the instructions ( and the
containing insn bundle) that the branch depended upon. Then again,
moving backwards from the branch
insn, we picked the first insn bundle with all unmarked insns ( and
cycle size of the bundle = no of delay slots
of a branch insn ) and put that bundle into the delay slot.

This approach worked fine for the small testcases that we had, but we
really didnt test this on any monstrous piece of software. We
implemented this for the TMS320C6x VLIW DSP.

HTH,
Pranav


Re: How to describe function units allocation

2007-11-14 Thread Pranav Bhandarkar
 Hi,
 For the backend TI DSP TMS320C6x, There are four types of functional
 units which are .L unit, .M unit, .S unit and .D unit, and each type
 consists of two units named .X1 and .X2 respectively. Namely, there are
 total 8 units. Except the .M units surve only for multiply, other units
 share many functions. For example, they both enable 32 bits arithmetical
 operation. And in the assembly, which functional unit is used to perform
 operation must be explicitly indicated. For example, ADD .S1 A0, A1, A2;
 ADD .L1 A0, A1, A2; ADD .D1 A0, A1, A2 achieve the same goal by using
 different units. Surely, when producing assembly, a functional unit
 allocation somewhat like register allocation is needed. I wonder how can
 I describe the relationship in the machine description file, and whether
 I need write a functional unit allocation algorithm or it is done by a
 general purpose allocation algorithm embedded in GCC, like register
 allocation, I only need give some architecture descriptions? Thanks in
 advance for your kind assistance.

IMHO. the functional units that accompany the assembly instruction are
optional. However, for c6x-gcc the reason cc1 doesnt allocate
functional units is that the assembler ( as part of the c6x binutils )
does the functional unit allocation on its own. There are some notes
about how the assembler does this in Extending the GNU Assembler for
Texas Instruments TMS320C6x-DSP.pdf

HTH,
Pranav


 Regards,
 Li Wang



Re: Reload using a live register to reload into

2007-11-12 Thread Pranav Bhandarkar
Hi,

 DF is supposed to be out of the game at this point, it has handed over the
 control since global.c:build_insn_chain as far as liveness info is concerned.

Oh I used DF and it worked for me. But I think that is because this is
the first new instruction to be inserted and nothing really must have
changed w.r.t the call_insn for the DF info to be no longer
consistent. However, in the light of the above, I shouldnt be using
DF.

 The REG_DEP_TRUE are somewhat misleading, it's an artifact in the dump.
 What you're seeing are the contents of CALL_INSN_FUNCTION_USAGE, which are
 always correct, so a solution to your problem would be to scan it for uses of
 registers in caller-save.c:insert_one_insn.  Of course this wouldn't plug the
 hole entirely but would very likely be sufficient in practice.
Good idea, I'll try using CALL_INSN_FUNCTION_USAGE.

cheers!
Pranav


Re: Reload using a live register to reload into

2007-11-08 Thread Pranav Bhandarkar
Hi,

  (call_insn:HI 91 270 92 5 cor_h.c:129 (parallel [
 (set (reg:SI 1 $c1)
 (call (mem:SI (symbol_ref:SI
  (DotProductWithoutShift) [flags 0x41] function_decl 0x401f7d00
  DotProductWithoutShift) [0 S4 A32])
 (const_int 0 [0x0])))
 (use (const_int 0 [0x0]))
 (clobber (reg:SI 31 $link))
 ]) 42 {*call_value_direct} (expr_list:REG_DEAD (reg:SI 4 $c4)
 (expr_list:REG_DEAD (reg:SI 3 $c3 [ ivtmp.103 ])
 (expr_list:REG_DEAD (reg:SI 2 $c2 [ h ])
 (nil
 (expr_list:REG_DEP_TRUE (use (reg:SI 4 $c4))
 (expr_list:REG_DEP_TRUE (use (reg:SI 3 $c3 [ ivtmp.103 ]))
 (expr_list:REG_DEP_TRUE (use (reg:SI 2 $c2 [ h ]))
 (expr_list:REG_DEP_TRUE (use (reg:SI 1 $c1 [ ivtmp.101 ]))
 (nil))

 I don't think so, it should be in dead_or_set, the value contained in $c1 dies
 in the insn.

Yes, after going through the code more closely, I concur.
The problem lies in that, $c1 isn't live_throughout, but at the point
before the call insn
it is live. Therefore If an instruction is inserted before the call
insn as is done when a caller
save instruction is inserted (by caller-save.c) and if this doesnt
kill $c1 then its live_throughout
should have the bit for $c1 set. This doesnt happen because while
inserting the caller save insn, its
live_throughout is simply set to the live_throughout of the call insn
+ the registers marked with
REG_DEAD notes in the call insn. However since $c1 is an argument to
the call it is used by the call_insn
and is marked  REG_DEP_TRUE ( Read after Write).  Shouldnt regs in
REG_DEP_TRUE be added to
live_throughout. My suspicion is that the LOG_LINKS are not always
up-to-date, therefore will it be
better to use DF_INSN_UID_USES ?

Thanks in advance,
Pranav


Re: About VLIW backend

2007-11-07 Thread Pranav Bhandarkar
  I am interesting in it. How about the current status, is it ongoing
 developing? Is it included in GCC official release?

Unfortunately our group is not actively working on that right now.
Because of some reasons
( mainly the paucity of time) we couldnt release it to the GCC
community then ( about 3 years back).

- Pranav


Re: Reload using a live register to reload into

2007-11-07 Thread Pranav Bhandarkar
Hi Eric,
Thanks for the response

 Of course, it goes to great length to do so but there can be bugs.  You didn't
 specify which version of the compiler you're using though; they may have been
 already fixed on the mainline.

Oh, I am using quite a new version of the compiler - rev 129547,
DATESTAMP 20071022.

cheers!
Pranav


Re: Target specific attributes to variables

2007-11-07 Thread Pranav Bhandarkar
 Even though the other 2 addressing modes are implemented, the
 attributes could not be checked in the other 2 modes. These 2 modes
 are disp with register and register indirect addressing modes. The
 tree structure in these addressing modes could not be checked for
 attributes using the RTX of the operand. We were unable to get any
 information from other target specific attributes.

Look up MEM_EXPR in the internals. You might want to use that for the
register indirect
case.

cheers!
Pranav


Re: About VLIW backend

2007-11-07 Thread Pranav Bhandarkar
  Did you test for large programs? Such as applications from SPEC 2006? or
 the equal size of programs. Thanks.

Oh no, we didnt. We stopped when we achieved fair stability purely on
the basis of
the number of  testsuite failures ( less than 100).

cheers!
Pranav


Re: Reload using a live register to reload into

2007-11-07 Thread Pranav Bhandarkar
 OK.  AFAICS there is nothing glaring in the RTL you posted so you'll have to
 put a watchpoint and find out who has set reg_rtx for this particular reload.

reg_rtx gets set due to a call to choose_reload_regs which in turn
calls allocate_reload_reg to set reg_rtx.

Also, just to confirm if I am on the right track, shouldnt the bit for
 reg #1 (i.e $c1)  be set in live_throughout in the insn chain for
insn  #91 ( reproduced below for convenience ) ?

(call_insn:HI 91 270 92 5 cor_h.c:129 (parallel [
   (set (reg:SI 1 $c1)
   (call (mem:SI (symbol_ref:SI
(DotProductWithoutShift) [flags 0x41] function_decl 0x401f7d00
DotProductWithoutShift) [0 S4 A32])
   (const_int 0 [0x0])))
   (use (const_int 0 [0x0]))
   (clobber (reg:SI 31 $link))
   ]) 42 {*call_value_direct} (expr_list:REG_DEAD (reg:SI 4 $c4)
   (expr_list:REG_DEAD (reg:SI 3 $c3 [ ivtmp.103 ])
   (expr_list:REG_DEAD (reg:SI 2 $c2 [ h ])
   (nil
   (expr_list:REG_DEP_TRUE (use (reg:SI 4 $c4))
   (expr_list:REG_DEP_TRUE (use (reg:SI 3 $c3 [ ivtmp.103 ]))
   (expr_list:REG_DEP_TRUE (use (reg:SI 2 $c2 [ h ]))
   (expr_list:REG_DEP_TRUE (use (reg:SI 1 $c1 [ ivtmp.101 ]))
   (nil))


TIA,
Pranav


Reload using a live register to reload into

2007-11-06 Thread Pranav Bhandarkar
Hi,
Working on a private port I am seeing a problem with reload clobbering
a live register and thus causing havoc.

Consider the following snippet of the code dump.
(note:HI 85 84 86 5 [bb 5] NOTE_INSN_BASIC_BLOCK)

(note:HI 86 85 89 5 NOTE_INSN_DELETED)

(insn:HI 89 86 87 5 cor_h.c:129 (set (reg:SI 3 $c3 [ ivtmp.103 ])
(sign_extend:SI (subreg:HI (reg:SI 206 [ ivtmp.103 ]) 0))) 86
{extendhisi2} (nil))

(insn:HI 87 89 88 5 cor_h.c:129 (set (reg:SI 1 $c1 [ ivtmp.101 ])
(reg:SI 208 [ ivtmp.101 ])) 45 {*movsi} (nil))

(insn:HI 88 87 270 5 cor_h.c:129 (set (reg:SI 2 $c2 [ h ])
(reg/v/f:SI 236 [ h ])) 45 {*movsi} (nil))

(insn:HI 270 88 91 5 cor_h.c:129 (set (reg:SI 4 $c4)
(const_int 0 [0x0])) 45 {*movsi} (nil))

(call_insn:HI 91 270 92 5 cor_h.c:129 (parallel [
(set (reg:SI 1 $c1)
(call (mem:SI (symbol_ref:SI
(DotProductWithoutShift) [flags 0x41] function_decl 0x401f7d00
DotProductWithoutShift) [0 S4 A32])
(const_int 0 [0x0])))
(use (const_int 0 [0x0]))
(clobber (reg:SI 31 $link))
]) 42 {*call_value_direct} (expr_list:REG_DEAD (reg:SI 4 $c4)
(expr_list:REG_DEAD (reg:SI 3 $c3 [ ivtmp.103 ])
(expr_list:REG_DEAD (reg:SI 2 $c2 [ h ])
(nil
(expr_list:REG_DEP_TRUE (use (reg:SI 4 $c4))
(expr_list:REG_DEP_TRUE (use (reg:SI 3 $c3 [ ivtmp.103 ]))
(expr_list:REG_DEP_TRUE (use (reg:SI 2 $c2 [ h ]))
(expr_list:REG_DEP_TRUE (use (reg:SI 1 $c1 [ ivtmp.101 ]))
(nil))

(insn:HI 92 91 285 5 cor_h.c:129 (set (reg/v:SI 230 [ s ])
(reg:SI 1 $c1)) 45 {*movsi} (expr_list:REG_DEAD (reg:SI 1 $c1)
(nil)))

(jump_insn:HI 285 92 286 5 (set (pc)
(label_ref 118)) 8 {jump} (nil))
;; End of basic block 5 - ( 10)

The register $c1 is used to pass the first argument to the function
DotProductWithoutShift
On encountering the call_insn ( insn no 91) global.c inserts a store
to save the register $c16 (which contains a variable 'tot'  and $c16
is a caller save register ).
Hence the following insn is inserted just before the call to
DotProductWithoutShift.

(insn 309 270 91 5 cor_h.c:129 (set (mem/c:SI (plus:SI (reg/f:SI 29 $sp)
(const_int 176 [0xb0])) [11 S4 A32])
(reg:SI 16 $c16)) 45 {*movsi} (nil))

However the index 176 is too large and
 (plus:SI (reg/f:SI 29 $sp)
(const_int 176 [0xb0])) needs to be reloaded.

$c1 gets chosen for this reload and now the dump snippet looks like

(note:HI 85 84 86 5 [bb 5] NOTE_INSN_BASIC_BLOCK)

(note:HI 86 85 89 5 NOTE_INSN_DELETED)

(insn:HI 89 86 87 5 cor_h.c:129 (set (reg:SI 3 $c3 [ ivtmp.103 ])
(sign_extend:SI (reg:HI 14 $c14 [orig:206 ivtmp.103 ] [206])))
86 {extendhisi2} (nil))

(insn:HI 87 89 88 5 cor_h.c:129 (set (reg:SI 1 $c1 [ ivtmp.101 ])
(reg:SI 8 $c8 [orig:208 ivtmp.101 ] [208])) 45 {*movsi} (nil))

(insn:HI 88 87 270 5 cor_h.c:129 (set (reg:SI 2 $c2 [ h ])
(reg/v/f:SI 22 $c22 [orig:236 h ] [236])) 45 {*movsi} (nil))

(insn:HI 270 88 329 5 cor_h.c:129 (set (reg:SI 4 $c4)
(const_int 0 [0x0])) 45 {*movsi} (nil))

(insn 329 270 330 5 cor_h.c:129 (set (reg:SI 1 $c1)
(const_int 176 [0xb0])) 45 {*movsi} (nil))

(insn 330 329 309 5 cor_h.c:129 (set (reg:SI 1 $c1)
(plus:SI (reg:SI 1 $c1)
(reg/f:SI 29 $sp))) 65 {*addsi3} (expr_list:REG_EQUIV
(plus:SI (reg/f:SI 29 $sp)
(const_int 176 [0xb0]))
(nil)))

(insn 309 330 91 5 cor_h.c:129 (set (mem/c:SI (reg:SI 1 $c1) [11 S4 A32])
(reg:SI 16 $c16)) 45 {*movsi} (nil))

(call_insn:HI 91 309 332 5 cor_h.c:129 (parallel [
(set (reg:SI 1 $c1)
(call (mem:SI (symbol_ref:SI
(DotProductWithoutShift) [flags 0x41] function_decl 0x401f7d00
DotProductWithoutShift) [0 S4 A32])
(const_int 0 [0x0])))
(use (const_int 0 [0x0]))
(clobber (reg:SI 31 $link))
]) 42 {*call_value_direct} (nil)
(expr_list:REG_DEP_TRUE (use (reg:SI 4 $c4))
(expr_list:REG_DEP_TRUE (use (reg:SI 3 $c3 [ ivtmp.103 ]))
(expr_list:REG_DEP_TRUE (use (reg:SI 2 $c2 [ h ]))
(expr_list:REG_DEP_TRUE (use (reg:SI 1 $c1 [ ivtmp.101 ]))
(nil))

(insn 332 91 333 5 (set (reg:SI 4 $c4)
(const_int 176 [0xb0])) 45 {*movsi} (nil))

(insn 333 332 310 5 (set (reg:SI 4 $c4)
(plus:SI (reg:SI 4 $c4)
(reg/f:SI 29 $sp))) 65 {*addsi3} (expr_list:REG_EQUIV
(plus:SI (reg/f:SI 29 $sp)
(const_int 176 [0xb0]))
(nil)))

(insn 310 333 285 5 (set (reg:SI 16 $c16)
(mem/c:SI (reg:SI 4 $c4) [11 S4 A32])) 45 {*movsi} (nil))

(jump_insn:HI 285 310 286 5 (set (pc)
(label_ref 118)) 8 {jump} (nil))
;; End of basic block 5 - ( 10)

clearly the register $c1 which after insn 87 has the first argument of
the function DotProductWithoutShift is overwritten.

The file.c.176r.greg for insn 309 says

Spilling for insn 

Re: About VLIW backend

2007-11-06 Thread Pranav Bhandarkar
On 06 Nov 2007 21:50:09 -0800, Ian Lance Taylor [EMAIL PROTECTED] wrote:
 Li Wang [EMAIL PROTECTED] writes:

  I wonder if any efforts have been made to retarget GCC to VLIW
  backend.Is there any project trying to do that? Is it included in the
  GCC mainstream? Thanks.

Dr. Baumgartl,  Jan Parthey and folks at the Chemnitz University of Technology
had got substantial success with a port for the TMS320C6x series of
VLIW processors.

http://archiv.tu-chemnitz.de/pub/2004/0107/data/index.html

We ( A few friends and I - college students then ) had added some
improvements to this port as part of
undergraduate university coursework ( project).


cheers!
Pranav


Problem with too many virtual operands ( tree-ssa-operands.c:484)

2007-10-16 Thread Pranav Bhandarkar
Hi,
In the attached testcase due to an ivopts modification, while
rewriting the uses the compiler crashes in tree-ssa-operands.c because
the number of virtual operands of the modified stmt is much greater
than the thresholds controlled by OP_SIZE_{1,2,3} in
tree-ssa-operands.c.

I went through
http://gcc.gnu.org/ml/gcc-patches/2006-12/msg01269.html

and it seemed to me that these values (OP_SIZE etc) have been quite
experimentally set. I am wondering If these values should be
increased.

I did increase these values and the attached testcase compiled fine. I
examined the resulting ivopts dump and the modifications seems valid
to me. I have attached two dumps ( before ivopts - i.e 105t.cunroll
and after ivopts - i.e 107t.ivopts ) . Note that post ivopts dump was
generated only after changing OP_SIZE_3 to 700 ( a randomly high value
).

For a quick check of the dumps, note that
 D.1281_6 = Hoopster_ptr_17-Magic;

gets changed to

 D.1281_6 = MEM[index: ivtmp.840_5];

I am wondering If increasing OP_SIZE_{1,2,3} is the way to go. Partly
not convinced because it means that the problem could hit again with
nastier code.

TIA,
Pranav


testcase-min.i
Description: Binary data


testcase-min.i.105t.cunroll
Description: Binary data


testcase-min.i.107t.ivopts
Description: Binary data


Identifying a block copy

2007-10-09 Thread Pranav Bhandarkar
Hi,
consider the following code,

struct x { int a; int b; int c; int d; int e[120];};
struct x *a, *b;
void foo ( )
{
*a = *b;
}

Now for the stmt int the function foo a memcpy will be generated.
However, this can be tail call optimized. My aim is to identify such
opportunities in find_tail_calls in tree-tailcall.c. However, for the
stmt

*a.0_1 ={v} *b.1_2;

var_can_have_subvars return zero for  *a.0_1 . But  a.0_1 is pointer
to a structure and *a.0_1 is a structure and therefore can have
subvars.

The comment for var_can_have_subvars says
/* Return true if V is a tree that we can have subvars for.
   Normally, this is any aggregate type.  Also complex
   types which are not gimple registers can have subvars.  */

IMHO, var_can_have_subvars for the above case should return true, but
it doesnt because it fails the following test in var_can_have_subvars.
  /* Non decls or memory tags can never have subvars.  */
  if (!DECL_P (v) || MTAG_P (v))
return false;

Am I missing something here ?

TIA,
Pranav


Re: Identifying a block copy

2007-10-09 Thread Pranav Bhandarkar
On 10/9/07, Daniel Berlin [EMAIL PROTECTED] wrote:
 Yes
 we do not create subvars for non-named memory locations.  IE random
 pointer dereferences.

 This is mainly because it would require a lot of time and memory in
 the compiler.

 It was done because most optimizers rely solely on vdef/vuses, instead
 of further disambiguating the memory dependence chains.

ok, makes sense.

 In any case, it's not clear why you care about subvars at all here.
 If you want to identify block copies, simply look for assignments
 between AGGREGATE_TYPE_P trees.

Aha, this would be neater.

Thanks for pointing me in the right direction.

cheers!
Pranav


Re: How to add target specific dependency?

2007-08-23 Thread Pranav Bhandarkar
On 8/23/07, petruk_gile [EMAIL PROTECTED] wrote:

 Hi all ..

 I'm currently porting GCC into a new processor, and I have a problem in
 instruction scheduling ...

 The case is like this:
 In the machine description (*.md) file, sometimes I emit a single RTL
 instruction into multiple ASM instruction. The problem is, in some case I
 need to emit an operand that actually doesn't exist in its RTL
 representation. For example :

 movpqi_insn x,y instruction will be translated as == sar x, *ar15  and
 lar y, *ar15

   Where sar means Store register and lar means load register ...

 Since GCC performs instruction scheduling in RTL form, it doesn't know that
 instruction movpqi_insn actually reads AR15  Hence, sometimes GCC
 moves an instruction that actually SHOULD NOT be moved, due to data
 dependence in AR15, and it causes incorrect scheduling  (This is my
 analysis, please tell me if you guys think I'm wrong)...

 So, I need to:
 (1) whether disable the scheduling for that particular dependency, or
 (2) Inform GCC that movpqi_insn has an additional dependency in AR15 

 The problem is, I still don't know how can i do those 2 things ... So if any
 of you have any advice, I'd be really grateful  :D
Ideally you could either split movpqi_insn (into a store and a load)
during expansion or split the insn appropriately just before
instruction scheduling (using a define_insn_and_split) .

cheers!
Pranav


Re: ICE on valid code, cse related

2007-08-08 Thread Pranav Bhandarkar
Hi,

   Pranav, although there is indeed a bug in the mid-end here, from your point
 of view the simple and effective workaround should be to implement a movdi
 pattern (and movsf and movdf if you don't have them yet: it's an absolute
 requirement to implement movMM for any modes you expect your machine to
 handle) in the backend.  This won't fix the underlying bug, but it'll stop it
 from affecting you, and you'll get better codegen all round into the bargain
 if you expand movdi early.

It worked!!! I implemented the movsf pattern ( and also movdf so that
the absence of a movdf also doesnt wont affect me in the future). Due
to the movsf pattern, the return value is now restored with

(insn 17 16 18 testcase-min.i:8 (set (reg:SF 139)
(mem/c/i:SF (reg/f:SI 129 virtual-stack-vars) [2 S4 A32])) -1
(expr_list:REG_LIBCALL_ID (const_int 1 [0x1])
(insn_list:REG_RETVAL 14 (expr_list:REG_EQUAL (float:SF (reg:SI 138))
(nil)

i.e. there is no subreg in the destination.
Later in cse when the above REG_EQUAL (float:SF (reg:SI 138)  note is
converted into  REG_EQUAL (const_double:SF 0 [0x0] 0.0 [0x0.0p+0] , It
doesnt replace
(subreg:SI (reg:SF 139) 0) in the insn
 (set (reg:SI 141)
   (xor:SI (subreg:SI (reg:SF 139) 0)
   (reg:SI 140))) 65 {xorsi3} (expr_list:REG_EQUAL
(const_double:SF 0 [0x0] -0.0 [-0x0.0p+0])
   (nil)))

and the compiler doesnt crash :)

Thanks Dave and Ian for your help!!

cheers!
Pranav


Re: ICE on valid code, cse related

2007-08-03 Thread Pranav Bhandarkar
 (reg:SF 139) can hold the value (const_double:SF 0) but (subreg:SI
 (reg:SF 139)) should be the value (const_int 0).  Perhaps the problem
 is how we handle a REG_EQUAL note when the destination of the set is a
 SUBREG.

(subreg:SI (reg:SF 139), 0) shouldnt be able to hold the value
(float:SF (reg:SI 138) ? Am I right on this ? Because the expander
generates the following while storing the return value
(insn 17 18 19 testcase-min.i:8 (set (subreg:SI (reg:SF 139) 0)
(mem/c/i:SI (reg/f:SI 129 virtual-stack-vars) [2 S4 A32])) -1
(expr_list:REG_LIBCALL_ID (const_int 1 [0x1])
(insn_list:REG_RETVAL 14 (expr_list:REG_EQUAL (float:SF (reg:SI 138))
(nil)

The answer to this question will help me decide where to fix the
problem, In the expander itself or while processing REG_EQUAL in the
cse pass.

Thanks,
Pranav


Re: ICE on valid code, cse related

2007-08-02 Thread Pranav Bhandarkar
 How can we have a PLUS on a CONST_DOUBLE and a CONST_INT?  That does
 not make sense, as there is no MODE argument that could make this work
 correctly.  From your description, MODE must be some integer mode, in
 which case it is wrong to be using a CONST_DOUBLE in SFmode.

 (I don't know where the bug is; I'm just trying to help pin it down.)
Here it is!!
The problem is that for  the following insn(insn 20 19 21 2
 (set (reg:SI 141)
(xor:SI (subreg:SI (reg:SF 139) 0)
(reg:SI 140))) 65 {xorsi3} (expr_list:REG_EQUAL
(const_double:SF 0 [0x0] -0.0 [-0x0.0p+0])
(nil)))

reg:SI 140 is known to have the constant value
(const_int -2147483648 [0x8000]))
 and (subreg:SI (reg:SF 139) 0) is known to have the value
(const_double:SF 0 [0x0] 0.0 [0x0.0p+0])   [ as described by my
previous post in the same thread ]

Now, simplify_binary_operation_1 in simplify-rtx.c tries to

 /* Canonicalize XOR of the most significant bit to PLUS.  */
(simplify-rtx.c:2203)

and this results in a PLUS on  CONST_INT and CONST_DOUBLE.

maybe there should be a better check before canonicalizing here ?

Thanks,
Pranav


Re: ICE on valid code, cse related

2007-08-02 Thread Pranav Bhandarkar
 reg:SI 140 is known to have the constant value
 (const_int -2147483648 [0x8000]))
Wasnt clear maybe in my previous post. reg:SI 140 is known to have
this const_int value from a previous copy into it - here

(insn 19 17 20 2 /fc3/scratchpad/testcase-min.i:8 (set (reg:SI 140)
(const_int -2147483648 [0x8000])) 44 {*movsi} (nil))


Thanks,
Pranav


ICE on valid code, cse related

2007-08-01 Thread Pranav Bhandarkar
Hi,
I am working on a private port and getting an ICE in valid code. This
mainly is because of the following ( which is a part of the entire
dump of RTL of the source file)

(insn 13 8 14 2 /fc3/testcases/reduce/testcase-min.i:8 (set (reg:SI 138)
(const_int 0 [0x0])) 44 {*movsi} (expr_list:REG_LIBCALL_ID
(const_int 0 [0x0])
(nil)))

(insn 14 13 15 2 /fc3/testcases/reduce/testcase-min.i:8 (set (reg:SI 1 $c1)
(reg/f:SI 112 *fp*)) 44 {*movsi} (expr_list:REG_LIBCALL_ID
(const_int 1 [0x1])
(insn_list:REG_LIBCALL 17 (nil

(insn 15 14 16 2 /fc3/testcases/reduce/testcase-min.i:8 (set (reg:SI 2 $c2)
(reg:SI 138)) 44 {*movsi} (expr_list:REG_LIBCALL_ID (const_int 1 [0x1])
(nil)))

(call_insn 16 15 18 2 /fc3/testcases/reduce/testcase-min.i:8 (parallel [
(call (mem:SI (symbol_ref:SI (__floatsisf) [flags 0x41])
[0 S4 A32])
(const_int 0 [0x0]))
(use (const_int 0 [0x0]))
(clobber (reg:SI 31 $link))
]) 41 {*call_direct} (expr_list:REG_LIBCALL_ID (const_int 1 [0x1])
(expr_list:REG_EH_REGION (const_int -1 [0x])
(nil)))
(expr_list:REG_DEP_TRUE (use (reg:SI 2 $c2))
(expr_list:REG_DEP_TRUE (use (reg:SI 1 $c1))
(nil

(insn 18 16 17 2 /fc3/testcases/reduce/testcase-min.i:8 (clobber
(reg:SF 139)) -1 (expr_list:REG_LIBCALL_ID (const_int 1 [0x1])
(nil)))

(insn 17 18 19 2 /fc3/testcases/reduce/testcase-min.i:8 (set
(subreg:SI (reg:SF 139) 0)
(mem/c/i:SI (reg/f:SI 112 *fp*) [2 S4 A32])) 44 {*movsi}
(expr_list:REG_LIBCALL_ID (const_int 1 [0x1])
(insn_list:REG_RETVAL 14 (expr_list:REG_EQUAL (float:SF (reg:SI 138))
(nil


Note the REG_EQUAL note of insn 17. cse tries to replace reg:SI 138
with a constant and because of insn 13, the note becomes (float:SF
(const_int 0)) which in turn cse converts into

REG_EQUAL (const_double:SF 0 [0x0] 0.0 [0x0.0p+0])

and when CONST_DOUBLE_LOW is done on the above, the compiler crashes -

 internal compiler error: RTL check: expected code 'const_double' and
mode 'VOID', have code 'const_double' and mode 'SF' in plus_constant,
at explow.c:103

i.e the compiler is crashing after converting a const_int to an SFmode value.

Could this possibly be a generic issue or a problem with my backend (
as in will I need to define movsf in my backend, which isnt defined at
present ) ?

Regret the rather verbose post.

Thanks in advance,
Pranav


Re: ICE on valid code, cse related

2007-08-01 Thread Pranav Bhandarkar
 Who is calling CONST_DOUBLE_LOW on this value?
plus_constant calls CONST_DOUBLE_LOW on this value.

simplify_binary_operation_1 calls plus_constant ( while trying to
simplify PLUS on (const_double:SF 0 [0x0] 0.0 [0x0.0p+0])  (const_int
-2147483648 [0x8000]) ), which in turn calls CONST_DOUBLE_LOW.

Thanks,
Pranav


Re: CSE removing a load that is necessary

2007-07-30 Thread Pranav Bhandarkar
Hi,

   Or perhaps this could be another manifestation of the cse gets confused by
 reg_equal notes on subparts of dimode pseudos if no movdi pattern is defined
 in the backend bug[*]?  Pranav, is there a movdi pattern in your backend?
 There needs to be one, gcc does get it wrong if you rely on it to break
 everything down to si-sized movs.

Yes, It looks like a similar problem, but there seems to be no
consensus on a correct solution to this problem. I couldnt find the
bug number but  this thread describes the exact same problem ( but
with REG_EQUIV notes).

http://gcc.gnu.org/ml/gcc/2001-02/msg01372.html


Thanks,
Pranav


CSE removing a load that is necessary

2007-07-27 Thread Pranav Bhandarkar
Hi All,
I am working on a private port and am seeing the following problem.
For a function returning a double the value is stored by the function
in memory. cse removes one of the two loads (to retrieve this returned
value) after the function is called.

To elaborate, the following is the dump just before cse.

(insn 44 43 45 2 test.c:388 (set (reg:SI 1 $c1)
(reg/f:SI 112 *fp*)) 44 {*movsi} (expr_list:REG_LIBCALL_ID
(const_int 2 [0x2])
(nil)))

(insn 45 44 46 2 test.c:388 (set (reg:SI 2 $c2)
(reg:SI 136 [ D.1517 ])) 44 {*movsi} (expr_list:REG_LIBCALL_ID
(const_int 2 [0x2])
(nil)))

(call_insn 46 45 49 2 test.c:388 (parallel [
(call (mem:SI (symbol_ref:SI (__floatunsidf) [flags
0x41]) [0 S4 A32])
(const_int 0 [0x0]))
(use (const_int 0 [0x0]))
(clobber (reg:SI 31 $link))
]) 41 {*call_direct} (expr_list:REG_LIBCALL_ID (const_int 2 [0x2])
(expr_list:REG_EH_REGION (const_int -1 [0x])
(nil)))
(expr_list:REG_DEP_TRUE (use (reg:SI 2 $c2))
(expr_list:REG_DEP_TRUE (use (reg:SI 1 $c1))
(nil

(insn 49 46 116 2 test.c:388 (clobber (reg:SI 179)) -1
(expr_list:REG_LIBCALL_ID (const_int 2 [0x2])
(nil)))

(insn 116 49 47 2 test.c:388 (clobber (reg:SI 180 [+4 ])) -1 (nil))

(insn 47 116 48 2 test.c:388 (set (reg:SI 179)
(mem/c/i:SI (reg/f:SI 112 *fp*) [7 S4 A32])) 44 {*movsi}
(expr_list:REG_LIBCALL_ID (const_int 2 [0x2])
(nil)))

(insn 48 47 50 2 test.c:388 (set (reg:SI 180 [+4 ])
(mem/c/i:SI (plus:SI (reg/f:SI 112 *fp*)
(const_int 4 [0x4])) [7 S4 A32])) 44 {*movsi}
(expr_list:REG_LIBCALL_ID (const_int 2 [0x2])
(expr_list:REG_EQUAL (float:DF (reg:SI 136 [ D.1517 ]))



cse modifies insn 48 as

(insn 48 47 50 2 test.c:388 (set (reg:SI 180 [+4 ])
(reg:SI 178 [+4 ])) 44 {*movsi} (expr_list:REG_LIBCALL_ID
(const_int 2 [0x2])
(expr_list:REG_EQUAL (float:DF (reg:SI 136 [ D.1517 ]))
(nil
(nil

and also replaces every subsequent use of (reg:SI 180 [+4 ]) with
(reg:SI 178 [+4 ]) thus making the above load dead, which gets
subsequently removed. This way the result of the function call is
lost.

My take is that insn 48 should have a REG_RETVAL note  ( Infact it
does have this but the note is removed by lower_subreg) and cse should
be careful when  REG_RETVAL and REG_EQUAL appear in the same insn. Is
this the right way of going about it ?

Sorry for a rather verbose post.

Thanks in advance,
Pranav


Re: CSE removing a load that is necessary

2007-07-27 Thread Pranav Bhandarkar
 Where does reg 178 come from?  It does not appear in the other insns
 you listed.

I am sorry, my mistake. I meant to say that the dump was only a part
of the entire dump of the function. reg 178 is the result of a
previous call to __floatsidf and is defined by the following insn.

(insn 19 18 21 2 test.c:356 (set (reg:SI 178 [+4 ])
(mem/c/i:SI (plus:SI (reg/f:SI 112 *fp*)
(const_int 4 [0x4])) [7 S4 A32])) 44 {*movsi}
(expr_list:REG_LIBCALL_ID (const_int 1 [0x1])
(expr_list:REG_EQUAL (float:DF (reg:SI 136 [ D.1517 ]))
(nil


 Am I reading your code correctly when it appears that the
 __floatunsidf function returns a value in memory rather than via a
 register?

For our architecture, we have had to follow this convention of
returning a double value by storing it in memory and loading it after
the function call. The ABI reserves only one SI mode register for
return values and hence the use of memory for returning a double.


 If lower-subreg split up the load from memory, then it was correct to
 remove the REG_RETVAL note.  There may be a bug here in that it should
 also remove the REG_EQUAL note in that case.  It may be that
 remove_retval_note needs to look for and remove a REG_EQUAL note.

Ok, so the approach should be to fix remove_retval_note to have it
remove REG_EQUAL note too rather than not call remote_retval_note at
all. I will submit a patch for comments.

Thanks,
Pranav


Re: Execute test fails in gcc testsuite

2007-07-18 Thread Pranav Bhandarkar

On 7/18/07, Venkatesan Jeevanandam [EMAIL PROTECTED] wrote:



I am working on the testsuite for a new crosscompiler hosted on x86
Platform,

While performing execute test using gcc testsuite,

I am getting the error message in execute test
/tmp/2112-1.x0: /tmp/2112-1.x0: cannot execute binary file

I know, have to use cross compiler simulator, for executing
$Arch-sim /tmp/2112-1.x0

But I don't know where to mention this configuration.


Since you havent mentioned anything about your .exp file in
dejagnu/baseboards, I am assuming you havent already written one.
Therefore you will  need to write a  .exp file ( e.g arm-sim.exp) file
and put it in the dejagnu/baseboards directory. Then while doing make
check-gcc pass it using the RUNTESTLFAG command line argument.
Typically,
$ make check-gcc RUNTESTFLAGS=--target_board=arch-sim

HTH,
Pranav


Re: Extending RTL expansion and CG with a new operation

2007-06-26 Thread Pranav Bhandarkar

kerneltest.c:22: error: unrecognizable insn:
(jump_insn 26 25 29 3 (set (pc)
(create_body_after (cre (reg:DI 75)
(const_int 0 [0x0]))
(label_ref 13)
(pc))) -1 (nil)
(nil))
kerneltest.c:22: internal compiler error: in extract_insn, at recog.c:2096
Please submit a full bug report,
with preprocessed source if appropriate.
See URL:http://gcc.gnu.org/bugs.html for instructions.


Find a pattern that is close to this one in your md file ( i.e. the
define_insn that you think should match this pattern, assuming such a
define_insn exists in your md file) and then try to check the reason
for recog crashing. IMO, the predicate may not be passing.

regards,
Pranav


A combiner type optimization is causing a failure

2007-02-22 Thread Pranav Bhandarkar

Hello all,
I added a small optimization which does the following . It converts
a = a + 1
if ( a  0 )

to
if ( a  -1)

a is a signed int.
However this is causing 920612-1.c to fail, which is reproduced below
for convenience.

f(j)int j;{return++j0;}
main(){ if(f((~0U)1)) abort(); exit(0); }

The problem is that this testcase passes the number 2147483647 (int is
4 bytes for my architecture) to which if 1 is added an overflow will
occur. Since I remove the increment operation in 'f' through the
optimization 2147483647 never gets incremented and 'f' always returns
1 and the testcase fails.

My question is that, IMO the test is checking overflow behaviour. Is
it right to have such a test ?
Regards,
Pranav


Re: A combiner type optimization is causing a failure

2007-02-22 Thread Pranav Bhandarkar

On 2/22/07, Paolo Bonzini [EMAIL PROTECTED] wrote:


 My question is that, IMO the test is checking overflow behaviour. Is
 it right to have such a test ?

Would you care to prepare a patch that moved it under gcc.dg, adding a {
dg-options -O2 -fno-strict-overflow } marker (or maybe -O2
-fno-wrapv)?  But your optimization should also be conditional on
whether strict overflow behavior is requested.

Paolo


Thanks, I will prepare a patch. A 'strict overflow behaviour' check
also makes sense.
Thank you,
Pranav


Use of INSN_CODE

2007-02-01 Thread Pranav Bhandarkar

Hi All,
I am using recog_memoized in the machine dependent reorg pass.
However, It is causing an ICE because unwittingly a CODE_LABEL is
getting passed to it.

I understand that CODE_LABEL is in the RTX_EXTRA class and intuitively
It is wrong to use INSN_CODE ( which is used in recog_memoized) on
CODE_LABEL simply because it is not int the RTX_INSN class.

However, the internals  only warn against using INSN_CODE on use,
clobber, asm_input, addr_vec, addr_diff_vec. There is no mention of
other members of the other members of RTX_EXTRA. or shouldnt
recog_memoized have an INSN_P check in it ?
Am I missing something here ?
TIA,
Pranav


Re: CSE not combining equivalent expressions.

2007-01-18 Thread Pranav Bhandarkar

On 1/18/07, Richard Kenner [EMAIL PROTECTED] wrote:

 I'm not immediately aware of too many cases where lowering the IL is
 going to expose new opportunities to track and optimize nonzero/zero
 bits stuff.

Bitfield are the big one.  If you have both bitfield and logical operations,
you can often merge the logical operations with those used to retrieve
and/or store the field.

Things such as

x.y |= 1;

where Y is a bitfield and X non-BLKmode would be a large sequence of logical
operations that can all be replaced by a single OR insn at the RTL level
but presents no optimization opportunities at the tree level.


I have found another case where a zero / sign extend is inhibiting optimization

extern char b;
extern char c;
int main()
{

 b = 1;
 b = 1;
 b = 1;
 c = b;

 return 0;
}

Here again a zero extend gets generated after every  'ashift' whereas
we are interested only in the lower order 8 bits. However when i try
the same thing with int instead of char i.e. there is no need for
extension then the operations get converted into b=3 instead of 3
instructions.
regards,
Pranav


Re: CSE not combining equivalent expressions.

2007-01-17 Thread pranav bhandarkar

Also this is removed for the case of integers by the CSE pass
IIRC . The problem arises only for the type being a char or a short.


Yes, That is true. With gcc 4.1 one of the 'or's gets eliminated for
'int'. I am putting below two sets of logs. The first just before
cse_main and the second just after cse_main has returned but the
trivially dead insns have not been deleted yet.

Set 1: Before cse_main

(note 9 6 11 0 [bb 0] NOTE_INSN_BASIC_BLOCK)

(insn 11 9 12 0 (set (reg:SI 1 $c1)
   (const_int 0 [0x0])) 43 {*movsi} (nil)
   (nil))

(call_insn 12 11 13 0 (parallel [
   (set (reg:SI 1 $c1)
   (call (mem:SI (symbol_ref:SI (gen_T) [flags 0x41]
function_decl 0xb7d81e00 gen_T) [0 S4 A32])
   (const_int 0 [0x0])))
   (use (const_int 0 [0x0]))
   (clobber (reg:SI 31 $link))
   ]) 39 {*call_value_direct} (nil)
   (nil)
   (expr_list:REG_DEP_TRUE (use (reg:SI 1 $c1))
   (nil)))

(insn 13 12 15 0 (set (reg:SI 134 [ D.1214 ])
   (reg:SI 1 $c1)) 43 {*movsi} (nil)
   (nil))

(insn 15 13 16 0 (set (reg:SI 133 [ D.1216 ])
   (ior:SI (reg:SI 134 [ D.1214 ])
   (const_int 1 [0x1]))) 64 {iorsi3} (nil)
   (nil))

(insn 16 15 17 0 (set (reg/f:SI 136)
   (symbol_ref:SI (a) [flags 0x2] var_decl 0xb7d8a05c a)) 43
{*movsi} (nil)
   (nil))

(insn 17 16 19 0 (set (mem/c/i:SI (reg/f:SI 136) [2 a+0 S4 A32])
   (reg:SI 133 [ D.1216 ])) 43 {*movsi} (nil)
   (nil))

Expansion of the if condition

(note 23 21 25 1 [bb 1] NOTE_INSN_BASIC_BLOCK)

(insn 25 23 26 1 (set (reg/f:SI 139)
   (symbol_ref:SI (a) [flags 0x2] var_decl 0xb7d8a05c a)) 43
{*movsi} (nil)
   (nil))

(insn 26 25 27 1 (set (reg:SI 140)
   (ior:SI (reg:SI 133 [ D.1216 ])
   (const_int 1 [0x1]))) 64 {iorsi3} (nil)
   (nil))

(insn 27 26 29 1 (set (mem/c/i:SI (reg/f:SI 139) [2 a+0 S4 A32])
   (reg:SI 140)) 43 {*movsi} (nil)
   (nil))

(code_label 29 27 30 2 2 (end) [1 uses])
.. to function end


Set 2: After cse_main
(note 9 6 11 0 [bb 0] NOTE_INSN_BASIC_BLOCK)

(insn 11 9 12 0 (set (reg:SI 1 $c1)
   (const_int 0 [0x0])) 43 {*movsi} (nil)
   (nil))

(call_insn 12 11 13 0 (parallel [
   (set (reg:SI 1 $c1)
   (call (mem:SI (symbol_ref:SI (gen_T) [flags 0x41]
function_decl 0xb7d81e00 gen_T) [0 S4 A32])
   (const_int 0 [0x0])))
   (use (const_int 0 [0x0]))
   (clobber (reg:SI 31 $link))
   ]) 39 {*call_value_direct} (nil)
   (nil)
   (expr_list:REG_DEP_TRUE (use (reg:SI 1 $c1))
   (nil)))

(insn 13 12 15 0 (set (reg:SI 134 [ D.1214 ])
   (reg:SI 1 $c1)) 43 {*movsi} (nil)
   (nil))

(insn 15 13 16 0 (set (reg:SI 133 [ D.1216 ])
   (ior:SI (reg:SI 134 [ D.1214 ])
   (const_int 1 [0x1]))) 64 {iorsi3} (nil)
   (nil))

(insn 16 15 17 0 (set (reg/f:SI 136)
   (symbol_ref:SI (a) [flags 0x2] var_decl 0xb7d8a05c a)) 43
{*movsi} (nil)
   (nil))

(insn 17 16 19 0 (set (mem/c/i:SI (reg/f:SI 136) [2 a+0 S4 A32])
   (reg:SI 133 [ D.1216 ])) 43 {*movsi} (nil)
   (nil))

 Expansion of the If condition 

(note 23 21 25 1 [bb 1] NOTE_INSN_BASIC_BLOCK)

(insn 25 23 26 1 (set (reg/f:SI 139)
   (reg/f:SI 136)) 43 {*movsi} (nil)
   (expr_list:REG_EQUAL (symbol_ref:SI (a) [flags 0x2] var_decl
0xb7d8a05c a)
   (nil)))

(insn 26 25 27 1 (set (reg:SI 140)
   (ior:SI (reg:SI 134 [ D.1214 ])
   (const_int 1 [0x1]))) 64 {iorsi3} (nil)
   (nil))

(insn 27 26 29 1 (set (mem/c/i:SI (reg/f:SI 136) [2 a+0 S4 A32])
   (reg:SI 133 [ D.1216 ])) 43 {*movsi} (nil)
   (nil))

(code_label 29 27 30 2 2 (end) [1 uses])
 to function end

Therefore as I see it, cse_main has followed the following steps
1) found that the source of the set in insn 26 is equivalent to the
source of the set in insn 15 and replaced the source in insn 26 with
that from insn 15
2) Found the lhs of insn 15 to be equal to that of insn 26 and stored
that instead in insn 27 thus making the result of insn 26 ( reg 140 )
unused ever again ( and insn 26 subsequently gets deleted) .

However for the case of char or short, zero / sign extends are
generated after the 'ior' operations and as a result the source of the
second 'ior' is then not equal to the source of the first 'ior'.


Re: CSE not combining equivalent expressions.

2007-01-17 Thread pranav bhandarkar

On 1/17/07, Mircea Namolaru [EMAIL PROTECTED] wrote:

 Thanks. Another question I have is that, in this case, will the
following

 http://gcc.gnu.org/wiki/Sign_Extension_Removal

 help in removal of the sign / zero extension ?

First, it seems to me that in your case:

(1) a = a | 1 /* a |= 1 */
(2) a = a | 1 /* a |= 1 */

the expressions a | 1 in (1) and (2) are different as the a
is not the same. So there is nothing to do for CSE.

If the architecture has an instruction that does both the
store and the zero extension, the zero extension instructions
become redundant.

The sign extension algorithm is supposed to catch such cases, but
I suspect that in this simple case the regular combine is enough.

Mircea


Thanks for the info. I went through the documentation provided by you
in see.c, which I must add is very comprehensive indeed, and realised
that we need an instruction that does a zero extend before a store so
that that the extension instructions become redundant and can be
removed.
Thank you,
Pranav


Re: relevant files for target backends

2007-01-16 Thread pranav bhandarkar

I am wondering where to define the prototypes for functions in
machine.c Shall the prototypes be defined in machine-protos.h or in
machine.h or in machine.c. As far as I understand the prototypes
should be defined in machine-protos.h, right? But if I do so several
errors/warnings arise because of undeclared prototypes.

Another question is where target macros should be defined. As far as I
can see machine.c has something like such a structure:

---snip---
#define SOME_MACRO
#define SOME_MACRO
#define SOME_MACRO
#define SOME_MACRO

struct gcc_target targetm = TARGET_INITIALIZER;

machine.h is used to define macros that give such information as the
register classes, whether little endian or not, sizes of integral
types etc.
The file machine.c, like you rightly said defines the targetm
structure that holds pointers to target related functions and data.
Such functions are defined in the .c file. Such target hooks are
#defined in the .c file.
HTH,
Pranav


CSE not combining equivalent expressions.

2007-01-15 Thread pranav bhandarkar

Hello Everyone,
I have the following source code

static int i;
static char a;

char foo_gen(int);
void foo_assert(char);
void foo ()
{
  int *x = i;
  a = foo_gen(0);
  a |= 1; /*  1-*/
  if (*x) goto end:
  a | =1; /* -2--*/
  foo_assert(a);
end:
  return;
}

Now I expect the CSE pass to realise that 1 and 2 are equal and eliminate 2.
However the RTL code before the first CSE passthe RTL snippet is as follows

(insn 11 9 12 0 (set (reg:SI 1 $c1)
  (const_int 0 [0x0])) 43 {*movsi} (nil)
  (nil))

(call_insn 12 11 13 0 (parallel [
  (set (reg:SI 1 $c1)
  (call (mem:SI (symbol_ref:SI (gen_T) [flags 0x41]
function_decl 0xb7d54e00 gen_T) [0 S4 A32])
  (const_int 0 [0x0])))
  (use (const_int 0 [0x0]))
  (clobber (reg:SI 31 $link))
  ]) 39 {*call_value_direct} (nil)
  (nil)
  (expr_list:REG_DEP_TRUE (use (reg:SI 1 $c1))
  (nil)))

(insn 13 12 14 0 (set (reg:SI 137)
  (reg:SI 1 $c1)) 43 {*movsi} (nil)
  (nil))

(insn 14 13 16 0 (set (reg:SI 135 [ D.1217 ])
  (reg:SI 137)) 43 {*movsi} (nil)
  (nil))

(insn 16 14 17 0 (set (reg:SI 138)
  (ior:SI (reg:SI 135 [ D.1217 ])
  (const_int 1 [0x1]))) 63 {iorsi3} (nil)
  (nil))

(insn 17 16 18 0 (set (reg:SI 134 [ D.1219 ])
  (zero_extend:SI (subreg:QI (reg:SI 138) 0))) 84 {zero_extendqisi2} (nil)
  (nil))

(insn 18 17 19 0 (set (reg/f:SI 139)
  (symbol_ref:SI (a) [flags 0x2] var_decl 0xb7d5d05c a)) 43
{*movsi} (nil)
  (nil))

(insn 19 18 21 0 (set (mem/c/i:QI (reg/f:SI 139) [0 a+0 S1 A8])
  (subreg/s/u:QI (reg:SI 134 [ D.1219 ]) 0)) 56 {*movqi} (nil)
  (nil))

 expansion of the if condition 

;; End of basic block 0, registers live:
(nil)

;; Start of basic block 1, registers live: (nil)
(note 25 23 27 1 [bb 1] NOTE_INSN_BASIC_BLOCK)

(insn 27 25 28 1 (set (reg:SI 142)
  (ior:SI (reg:SI 134 [ D.1219 ])
  (const_int 1 [0x1]))) 63 {iorsi3} (nil)
  (nil))

(insn 28 27 29 1 (set (reg:SI 133 [ temp.28 ])
  (zero_extend:SI (subreg:QI (reg:SI 142) 0))) 84 {zero_extendqisi2} (nil)
  (nil))

(insn 29 28 30 1 (set (reg/f:SI 143)
  (symbol_ref:SI (a) [flags 0x2] var_decl 0xb7d5d05c a)) 43
{*movsi} (nil)
  (nil))

(insn 30 29 32 1 (set (mem/c/i:QI (reg/f:SI 143) [0 a+0 S1 A8])
  (subreg/s/u:QI (reg:SI 133 [ temp.28 ]) 0)) 56 {*movqi} (nil)
  (nil))



Now the problem is that the CSE  pass doesnt identify that the source
of the set in insn 27 is equivalent to the source of the set in insn
16. This It seems happens because of the zero_extend in insn 17. I am
using a 4.1 toolchain. However with a 3.4.6 toolchain no zero_extend
gets generated and the result of the ior operation is immediately
copied into memory. I am compiling this case with -O3. Can  anybody
please tell me how this problem can be overcome.

TIA,
Pranav


Re: CSE not combining equivalent expressions.

2007-01-15 Thread pranav bhandarkar

On 1/15/07, Richard Guenther [EMAIL PROTECTED] wrote:

On 1/15/07, pranav bhandarkar [EMAIL PROTECTED] wrote:
 Hello Everyone,
 I have the following source code

 static int i;
 static char a;

 char foo_gen(int);
 void foo_assert(char);
 void foo ()
 {
int *x = i;
a = foo_gen(0);
a |= 1; /*  1-*/
if (*x) goto end:
a | =1; /* -2--*/
foo_assert(a);
 end:
return;
 }

 Now I expect the CSE pass to realise that 1 and 2 are equal and eliminate 2.
 However the RTL code before the first CSE passthe RTL snippet is as 
follows

 (insn 11 9 12 0 (set (reg:SI 1 $c1)
(const_int 0 [0x0])) 43 {*movsi} (nil)
(nil))

 (call_insn 12 11 13 0 (parallel [
(set (reg:SI 1 $c1)
(call (mem:SI (symbol_ref:SI (gen_T) [flags 0x41]
 function_decl 0xb7d54e00 gen_T) [0 S4 A32])
(const_int 0 [0x0])))
(use (const_int 0 [0x0]))
(clobber (reg:SI 31 $link))
]) 39 {*call_value_direct} (nil)
(nil)
(expr_list:REG_DEP_TRUE (use (reg:SI 1 $c1))
(nil)))

 (insn 13 12 14 0 (set (reg:SI 137)
(reg:SI 1 $c1)) 43 {*movsi} (nil)
(nil))

 (insn 14 13 16 0 (set (reg:SI 135 [ D.1217 ])
(reg:SI 137)) 43 {*movsi} (nil)
(nil))

 (insn 16 14 17 0 (set (reg:SI 138)
(ior:SI (reg:SI 135 [ D.1217 ])
(const_int 1 [0x1]))) 63 {iorsi3} (nil)
(nil))

 (insn 17 16 18 0 (set (reg:SI 134 [ D.1219 ])
(zero_extend:SI (subreg:QI (reg:SI 138) 0))) 84 {zero_extendqisi2} 
(nil)
(nil))

 (insn 18 17 19 0 (set (reg/f:SI 139)
(symbol_ref:SI (a) [flags 0x2] var_decl 0xb7d5d05c a)) 43
 {*movsi} (nil)
(nil))

 (insn 19 18 21 0 (set (mem/c/i:QI (reg/f:SI 139) [0 a+0 S1 A8])
(subreg/s/u:QI (reg:SI 134 [ D.1219 ]) 0)) 56 {*movqi} (nil)
(nil))

  expansion of the if condition 

 ;; End of basic block 0, registers live:
 (nil)

 ;; Start of basic block 1, registers live: (nil)
 (note 25 23 27 1 [bb 1] NOTE_INSN_BASIC_BLOCK)

 (insn 27 25 28 1 (set (reg:SI 142)
(ior:SI (reg:SI 134 [ D.1219 ])
(const_int 1 [0x1]))) 63 {iorsi3} (nil)
(nil))

 (insn 28 27 29 1 (set (reg:SI 133 [ temp.28 ])
(zero_extend:SI (subreg:QI (reg:SI 142) 0))) 84 {zero_extendqisi2} 
(nil)
(nil))

 (insn 29 28 30 1 (set (reg/f:SI 143)
(symbol_ref:SI (a) [flags 0x2] var_decl 0xb7d5d05c a)) 43
 {*movsi} (nil)
(nil))

 (insn 30 29 32 1 (set (mem/c/i:QI (reg/f:SI 143) [0 a+0 S1 A8])
(subreg/s/u:QI (reg:SI 133 [ temp.28 ]) 0)) 56 {*movqi} (nil)
(nil))



 Now the problem is that the CSE  pass doesnt identify that the source
 of the set in insn 27 is equivalent to the source of the set in insn
 16. This It seems happens because of the zero_extend in insn 17. I am
 using a 4.1 toolchain. However with a 3.4.6 toolchain no zero_extend
 gets generated and the result of the ior operation is immediately
 copied into memory. I am compiling this case with -O3. Can  anybody
 please tell me how this problem can be overcome.

CSE/FRE or VRP do not track bit operations and CSE of bits.  To overcome this
you need to implement such.



Thanks. Another question I have is that, in this case, will the following

http://gcc.gnu.org/wiki/Sign_Extension_Removal

help in removal of the sign / zero extension ?

Thanks in Advance,
Pranav


Unable to access archives on gcc-patches

2006-01-24 Thread pranav bhandarkar
Hi,
I was having trouble with building gcc and found that the problem i
was having had been reported earlier and a patch to fix that had been
submitted in feb 2004.
However when i try to access the mailing list archives i am able to
reach the index page for feb 2004. the link to which is given as:

 http://gcc.gnu.org/ml/gcc-patches/2004-02/

However the patch that i am looking for is not accessible. the link to
the patch is

http://gcc.gnu.org/ml/gcc-patches/2004-02/msg00826.html

Just to check, i tried accessing other patches from the index link but
could not access most of them.
Can anybody please help me out.
Thanks in advance,
Pranav
--
So far as I am able to judge, nothing has been left undone,
 either by man or nature, to make India the most extraordinary
country that the sun visits on his rounds. Nothing seems to
have been forgotten, nothing overlooked.
Mark Twain