Re: Upstream GCC feature freeze

2010-11-08 Thread Richard Sandiford
Mark Mitchell m...@codesourcery.com writes: On 11/8/2010 7:22 AM, Yao Qi wrote: In this situation, this LP GCC 4.6 branch can be regarded as our upstreams at that moment. * Try to get upstream approval for all new patches in the usual way * on the understanding that they won't be

Re: GCC SVN vs. BZR/LP

2010-11-09 Thread Richard Sandiford
Ira Rosen ira.ro...@linaro.org writes: On 9 November 2010 14:38, Andrew Stubbs [1]...@codesourcery.com wrote: Re my recent email Upstream GCC feature freeze, I think we're agreed that we need to create a branch that tracks GCC 4.6 development, but has our own performance improvements

[ACTIVITY] Weekly status

2010-11-19 Thread Richard Sandiford
== This week == Started looking at STT_GNU_IFUNC support in BFD. There were a couple of janitorial changes I needed to make in order to prepare elf32-arm.c for the main patch. I tested those separately and submitted them upstream: http://sourceware.org/ml/binutils/2010-11/msg00330.html

__sync barriers

2010-11-22 Thread Richard Sandiford
For the record, the thing I half-remembered on the call was: http://gcc.gnu.org/ml/gcc-patches/2009-08/msg00697.html and: http://gcc.gnu.org/ml/gcc-patches/2009-09/msg02112.html The problem is that all __sync operations besides __sync_lock_test_and_set and __sync_lock_release are defined

[ACTIVITY] Weekly status

2011-01-07 Thread Richard Sandiford
== This week == * Away Monday, and a fair bit of time on non-Linaro duties. * Looked at Dave's gromacs bug (693502). Turned out to be a reload inheritance problem. Tested a patch. Spent some time coming up with a brute-force testcase that I can submit with the patch. * Found a bug in the

[ACTIVITY] Weekly status

2011-02-07 Thread Richard Sandiford
== Last week == * Backported the fixes for lp693502, lp710623 and lp710652 to linaro 4.6 and linaro 4.5. Tested and sent merge requests. * Wrote several more ifunc tests, and fixed the bugs they showed up. Found that ARM generates unnecessary dynamic relocs against GOT entries, so fixed

Re: Request for new relocation: R_ARM_IRELATIVE

2011-02-09 Thread Richard Sandiford
Hi Lee, Thanks for the reply. Lee Smith lee.sm...@arm.com writes: To the extent that I do understand your request I think you are asking for one new _dynamic relocation_ code to be allocated. (A dynamic relocation is one that is usually performed by a dynamic linker (because, for example,

[ACTIVITY] Weekly status

2011-02-11 Thread Richard Sandiford
== This week == * Got the STT_GNU_IFUNC work ready to submit. Split out some preparatory patches, including fixes for some general ARM inefficiencies that I noticed this week. Ran the EGLIBC testsuite (including ifunc tests) and they passed. * Discussed ideas for representing permuted

Re: Improving the code generated for vld and vst intrinsics

2011-02-22 Thread Richard Sandiford
Julian Brown jul...@codesourcery.com writes: Richard Sandiford richard.sandif...@linaro.org wrote: One of the vectorisation discussions from last year was about the poor code GCC generates for vld{2,3,4}_*() and vst{2,3,4}_*(). It forces the result of the loads onto the stack, then loads

Re: Improving the code generated for vld and vst intrinsics

2011-02-22 Thread Richard Sandiford
Julian Brown jul...@codesourcery.com writes: 2. Builtins (__builtin_neon_*) which previously used big integer modes to pass/return values, are initialised such that they directly pass/return the struct types above instead. The intrinsic wrappers in arm_neon.h no longer need to use unions

Representing interleaving and lane load/stores at the tree level

2011-02-25 Thread Richard Sandiford
I've been spending this week playing around with various representations of the v{ld,st}{1,2,3,4}{,_lane} operations. I agree with Ira that the best representation would be to use built-in functions. One concern in the original discussion was that the optimisers might move the original MEM_REFs

Re: Representing interleaving and lane load/stores at the tree level

2011-02-25 Thread Richard Sandiford
Richard Sandiford richard.sandif...@linaro.org writes: __builtin_store_lanes (VECTORS : array N of vector M of X) returns array N*M of X maps to vstN in practice, the argument would be populated by assignments of the form: vectorX = ARRAY_REF result, X er

Re: Substituting -msoft-float/-mfloat-abi=* in the proper order in spec file

2011-03-21 Thread Richard Sandiford
Loïc Minier loic.min...@linaro.org writes: I'm trying to extend the *link: specs to pass a different -dynamic-linker depending on the float ABI. But I didn't manage to build a construct which would preserve the order of the flags; if I do something like: %{msoft-float:-dynamic-linker

[ACTIVITY] Weekly status

2011-03-21 Thread Richard Sandiford
== Last week == * Committed STT_GNU_IFUNC changes to binutils. * Submitted the STT_GNU_IFUNC changes to GLIBC ports. Got feedback on Friday, which I'll deal with this week. * Worked on the expand and rtl-level parts of the load/store lane representation, with new optabs for each operation.

[ACTIVITY] Weekly status

2011-03-25 Thread Richard Sandiford
== This week == * Moved the discussion about the RTL and gimple representation of strided loads/stores to the gcc@ list. Got some good feedback: http://gcc.gnu.org/ml/gcc/2011-03/msg00322.html * Started a subdiscussion about the handling of modes:

Re: NEON intrinsics and stack access

2011-03-31 Thread Richard Sandiford
Michael Hope michael.h...@linaro.org writes: For reference. We know that the NEON intrinsics in GCC have issues. I came across this page: http://hilbert-space.de/?p=22 which has a colour to greyscale conversion done using intrinsics. gcc-linaro-4.5-2011.03-0 does poorly through saving

[ACTIVITY] Weekly status

2011-04-04 Thread Richard Sandiford
== Last week == * Finished the patch that I was working on last week to use memory operands rather than register operands in neon.md. Submitted upstream: http://gcc.gnu.org/ml/gcc-patches/2011-03/msg01996.html Among other things, this allows the intrinsics to use post-modified

[ACTIVITY] Weekly status

2011-04-11 Thread Richard Sandiford
== Last week == * Sent a fix for PR target/46329 upstream. * Discussed with Richard Guenther how to represent the interleaved load/store functions that we're adding to gimple. Sent a patch upstream for comments. Richard confirmed on IRC that he was happy with it, and no-one else has

Re: First toolchain estimate for the next cycle

2011-04-13 Thread Richard Sandiford
Michael Hope michael.h...@linaro.org writes: Hi there. Mounir and I have been looking at the work for next cycle. A summary spreadsheet with notes is available here: https://spreadsheets0.google.com/ccc?key=ty1c-H56f0GxnL1Hk9LCmRg I'm very interested in feedback, especially on the time

Some initial notes on the effects of vldN and vstN vectorisation

2011-04-13 Thread Richard Sandiford
I've now submitted the initial vldN and vstN work, so I thought I'd see how often it triggers for natty's libav package. I've put some initial results here: https://wiki.linaro.org/RichardSandiford/Sandbox/NeonLibAv There are more files to go through, so this isn't complete. I've also left

[ACTIVITY] Weekly status

2011-04-15 Thread Richard Sandiford
== This week == * Worked on a fix for https://bugs.launchpad.net/gcc-linaro/+bug/758082 Submitted the patch upstream. * Finished first cut of vldN and vstN vectorisation. Send the patches upstream. Most of the patches have been approved, but I'll wait for the others before committing. *

[1/5] Improve output of vld3q and vld4q

2011-04-20 Thread Richard Sandiford
which has been applied to 4.7. No changes were needed for 4.5. Richard gcc/ Backport from mainline: 2011-03-30 Richard Sandiford richard.sandif...@linaro.org Ramana Radhakrishnan ramana.radhakrish...@linaro.org PR target/43590 * config/arm

[3/5] Allow arrays of vectors to be stored in registers

2011-04-20 Thread Richard Sandiford
This patch allows the target to override MAX_FIXED_MODE_SIZE for specific kinds of array. We can then give a non-BLK mode to things like uint32x2x4_t, which in turn allows them to be stored in registers. The patch is a backport of: http://gcc.gnu.org/ml/gcc-patches/2011-03/msg02192.html

[4/5] Convert LEGITIMATE_CONSTANT_P into a hook and add a more argument

2011-04-20 Thread Richard Sandiford
This patch converts LEGITIMATE_CONSTANT_P into a target hook and passes along the mode of the constant. This can then be used by 5/5. The patch is a version of: http://gcc.gnu.org/ml/gcc-patches/2011-04/msg00195.html which is still pending review after two pings. It seems pretty simple

[5/5] Fix PR target/46329

2011-04-20 Thread Richard Sandiford
This patch handles moves involving structure constants. It's a backport of: http://gcc.gnu.org/ml/gcc-patches/2011-04/msg00200.html which Richard Earnshaw has approved, but which cannot be applied yet because it depends on 4/5. The patch is needed because 3/5 would otherwise expose new

NEON intrinsics vs. assembly code

2011-04-21 Thread Richard Sandiford
Michael mentioned that some users reported seeing better preformance from RVCT using arm_neon.h then they did when coding directly in assembler. He suggested we try the same thing for GCC. Here's an experiment using the example that Jim Huang posted to the dev list recently:

[ACTIVITY] Weekly status

2011-04-21 Thread Richard Sandiford
== This week == * Iterated with upstream on some of the vectorisation patches. I think only half a patch (the ARM implementation of array_mode_supported_p) is still pending review; everything else has been approved. * Backported the vldN and vstN intrinsics to Linaro 4.5. * Finished off

Re: Pushing to diverged branches

2011-05-05 Thread Richard Sandiford
Andrew Stubbs a...@codesourcery.com writes: On 05/05/11 08:43, Richard Sandiford wrote: Anyway, the bzr help page seemed to suggest that merging in the new 4.6 revision was the Right Thing to do. I'm afraid that, once again, it felt so natural to resolve push conflicts this way that I didn't

Idea for auto-increment performance improvement

2011-05-16 Thread Richard Sandiford
Last week, Ramana pointed me at an upstream bug report about the inefficient code that GCC generates for vzip, vuzp and vtrn: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48941 It was filed not longer after the Neon seminar at the summit; I'm not sure whether that was a coincidence or not. I

Re: Engineering blueprints for 11.11

2011-05-19 Thread Richard Sandiford
I've added some ideas to the NEON blueprint. There are now really 6 separate tasks, broken down into subitems, so it looks like we really could have 6 separate blueprints, as you mentioned on the wiki page. I wasn't sure how to create those blueprints correctly though. Please let me know if they

[ACTIVITY] Weekly status

2011-05-20 Thread Richard Sandiford
== This week == * Spent almost all the week on GCC's auto inc/dec pass. I first continued with the incremental clean ups and recoding that I'd started during free time at Budapest, with the idea of bolting the new optimisations on top of that. However, in the end, I decided it would be

[ACTIVITY] Weekly status

2011-06-24 Thread Richard Sandiford
== This week == * Catching up on email. * More experiementation with the auto inc/dec stuff. TBH, this has taken longer than expected, but I think it's close now. * Wrote a dejagnu testcase for PR 49196. Tested it on trunk and submitted it upstream. == Next week == * Backport fix for PR

[ACTIVITY] Weekly status

2011-07-15 Thread Richard Sandiford
== This week == * Fixed the unnecessary union initialisers that were causing ICEs with -g. This turned out to be a lot more work than Richard's one-liner suggested. :-) * Backported Chung-Lin's arm_legitimize_reload_address patch to 4.5. * Backported the smallest_mode_for_size patch to 4.5

[ACTIVITY] Weekly status

2011-07-22 Thread Richard Sandiford
== This week == * Wrote a fix for 809768. Accepted upstream. * Looked at upstream PR 49742 (the failures seen with predictive commoning). Accepted upstream. * More shrink-wrap review. * Sent auto-inc-dec changes out for comments. Got some good private feedback (in the sense of being

libav wiki page updated for current FSF trunk

2011-08-04 Thread Richard Sandiford
I've updated: https://wiki.linaro.org/RichardSandiford/Sandbox/NeonLibAv so that it gives the output for current trunk, including Ira's commit yesterday to reduce the amount of overpromotion. I also reran the microbenchmarks. The good news is that the vectorised code is now better in all

[ACTIVITY] Weekly status

2011-08-08 Thread Richard Sandiford
== Last week (Linaro Connect) == * Reran libav comparisons after Ira's fix for excessive promotion. The vectorized versions are now at least as good as the non-vectorised ones. Updated wiki page with new asm output and microbenchmark results. * More work on SMS. I have some patches that

Re: RFC: Saving and restoring the assembler state during assembly

2011-08-11 Thread Richard Sandiford
Dave Martin dave.mar...@linaro.org writes: However, there's not really anything fundamentally architecture-specific about this problem, and ideally the solution and the directives should not be architecture-specific either. One option which appeals to me is to have some directives which can

[ACTIVITY] Weekly status

2011-08-12 Thread Richard Sandiford
== This week == * Looked a bug report that the fix for LP #736007 had caused regressions on powerpc-darwin. It turned out to be a target-specific bug; the backend has the same const_vector code as i386 and spu, but the fix for PR34856 was never applied there. I'll submit the patch (and

Re: Basic libav profiling

2011-08-16 Thread Richard Sandiford
Michael Hope michael.h...@linaro.org writes: I put a build harness around libav and gathered some profiling data. See: bzr branch lp:~linaro-toolchain-dev/+junk/libav-suite It includes a Makefile that builds a C only, h.264 only decoder and two Creative Commons licensed videos to use as

Re: Basic libav profiling

2011-08-18 Thread Richard Sandiford
Michael Hope michael.h...@linaro.org writes: On Tue, Aug 16, 2011 at 11:32 PM, Richard Sandiford richard.sandif...@linaro.org wrote: Michael Hope michael.h...@linaro.org writes: I put a build harness around libav and gathered some profiling data.  See:  bzr branch lp:~linaro-toolchain-dev

Effect of SMS register move scheduling

2011-08-24 Thread Richard Sandiford
Following on from yesterday's call about what it would take to enable SMS by default: one of the problems I was seeing with the SMS+IV patch was that we ended up with excessive moves. E.g. a loop such as: void foo (int *__restrict a, int n) { int i; for (i = 0; i n; i

Re: Effect of SMS register move scheduling

2011-08-25 Thread Richard Sandiford
Revital Eres revital.e...@linaro.org writes: btw, do you also have numbers of how much SMS (hopefully) improves performance on top of the vectorized code? OK, here's a comparison of: -mcpu=cortex-a8 -mfpu=neon -mfloat-abi=softfp -mvectorize-with-neon-quad -fno-auto-inc-dec vs:

Re: Effect of SMS register move scheduling

2011-08-25 Thread Richard Sandiford
Richard Sandiford richard.sandif...@linaro.org writes: Revital Eres revital.e...@linaro.org writes: btw, do you also have numbers of how much SMS (hopefully) improves performance on top of the vectorized code? OK, here's a comparison of: -mcpu=cortex-a8 -mfpu=neon -mfloat-abi=softfp

Re: Effect of SMS register move scheduling

2011-08-26 Thread Richard Sandiford
Richard Sandiford richard.sandif...@linaro.org writes: Richard Sandiford richard.sandif...@linaro.org writes: Revital Eres revital.e...@linaro.org writes: btw, do you also have numbers of how much SMS (hopefully) improves performance on top of the vectorized code? OK, here's a comparison

[ACTIVITY] Weekly status

2011-08-26 Thread Richard Sandiford
== This week == * Wrote some patches to make SMS schedule register moves. They made a significant difference to some libav loops. I'm running a regression test on pwoerpc-ibm-aix5.3.0 and will submit upstream next week if all goes OK. * Looked at why mjpegenc was so much worse with SMS.

libav-based microbenchmarks in bzr

2011-09-02 Thread Richard Sandiford
I've tried to clean up the libav microbenchmarks that I did for the strided load/store stuff. They're on Launchpad at: lp:~rsandifo/+junk/loop-microbenchmarks The main changes are that the benchmarks now preload the caches (for CPUs that don't allocate on write) and that they now check the

[ACTIVITY] Weekly status

2011-09-02 Thread Richard Sandiford
== This week == * Looked at the get_arm_condition_code ICE. Seems to be a popular bug: was reported as #589887 #823708 and #809761 in Lauchpad and as PR49030 in bugzilla. Sent a patch upstream. * Submitted SMS register-dependency patch upstream. * Reviewed Bernd's new shrink-wrap patch.

Re: Vectorised copy

2011-09-06 Thread Richard Sandiford
Michael Hope michael.h...@linaro.org writes: While out benchmarking today, I ran across code similar to this: int *a; int *b; int *c; const int ad[320]; const int bd[320]; const int cd[320]; void fill() { for (int i = 0; i 320; i++) { a[i] = ad[i]; b[i] = bd[i];

Added h264 loops to libav microbenchmarks

2011-09-09 Thread Richard Sandiford
Just as an FYI, I've added these loops to the libav microbenchmarks avg-h264-chroma-mc8-8.txt avg-pixels8-8.txt ff-h264-idct-add-8-8.txt ff-put-pixels8x16-8.txt h264-loop-filter-luma-8.txt idct-internal-8.txt put-h264-chroma-mc8-8.txt put-h264-qpel8-h-lowpass-8.txt

Re: arm-eabi-g++: error: unrecognized option '-avoid-version'

2011-09-13 Thread Richard Sandiford
Very nonauthorotative answer, but... Asa Sandahl asa.sand...@linaro.org writes: When building Android with the Linaro toolchain, I encountered this link time error when going from gcc 4.4.3 to gcc 4.6. arm-eabi-g++: error: unrecognized option '-avoid-version' I find several posts about

[ACTIVITY] Weekly status

2011-09-23 Thread Richard Sandiford
== This week == * Submitted a fix for the performance regression caused by my arm_comparison_operator patch. Applied upstream after approval from Ramana (thanks). Will backport to Linaro towards the end of next week if there are no reported problems. * Went back to looking at

[ACTIVITY] Weekly status

2011-10-10 Thread Richard Sandiford
== Last week == * Patch review. * Backported second attempt to fix get_arm_condition_code ICE. * Worked on -fsched-pressure. Experimented with various combinations of ideas. This is giving some good results (e.g. a 2x improvement in libav's put_h264_qpel8_hv_lowpass_8) but needs a bit

Re: limits-fndefn.c and timeouts

2011-10-11 Thread Richard Sandiford
Michael Hope michael.h...@linaro.org writes: limits-fndefn.c takes an impressively long time to run. On an idle machine, -O3 -g -c takes 17:31 and -O2 -g -c takes The test already has a dg-timeout-factor of 4 giving a total timeout of 20 minutes. Removing the -g brings this down to 30 s.

[ACTIVITY] Weekly status

2011-10-17 Thread Richard Sandiford
== Last week and today == * Backported fix for returning std::pairbool, bool. Unfortunately this showed up a regression on 4.5. I couldn't reproduce it cross, and the testcase itself looks innocuous, so I'm wondering whether the patch might trigger a miscompilation of cc1plus. *

Re: Agenda for tomorrow's call .

2011-11-14 Thread Richard Sandiford
Revital Eres revital.e...@linaro.org writes: Another issue is related to the regression I saw with SMS in libav's dsputil-ssd_int8_vs_int16_c. Consulting with Ayal regarding this it seemed that the regression was due to dependence between accumulations that can be avoided, more specifically

[ACTIVITY] Weekly status

2011-12-02 Thread Richard Sandiford
== This week == * More on -fsched-pressure. Testing on POWER7 showed a degenerate case that I'd failed to handle well. Fixed that. Saw that part of the problem on POWER7 was that IRA was using a combination of GENERAL_REGS and CR_REGS as a single pressure class, so there appeared to be

Re: Static Library startup

2011-12-05 Thread Richard Sandiford
Dave Martin dave.mar...@linaro.org writes: Another way of doing a similar thing is to mark __mylib_constructor as undefined in all the objects that make up the library. Unfortunately, there seems to be no obvious way of doing that: the assembler generates undefined symbol references

Re: Static Library startup

2011-12-05 Thread Richard Sandiford
Richard Sandiford richard.sandif...@linaro.org writes: Dave Martin dave.mar...@linaro.org writes: Another way of doing a similar thing is to mark __mylib_constructor as undefined in all the objects that make up the library. Unfortunately, there seems to be no obvious way of doing

[ACTIVITY] Weekly status

2011-12-09 Thread Richard Sandiford
== This week == * Got the -fsched-pressure code into a state where it's almost presentable. Found a few more things to tweak on the way. Fixed some FIXMEs, notably to honour MAX_SCHED_READY_INSNS. * More testing on ARM. Tried to get some SPEC2000 results as well as the usual EEMBC

Re: cdce3.C execution fault

2011-12-20 Thread Richard Sandiford
Michael Hope michael.h...@linaro.org writes: Hi there. I've looked further into the intermittent gcc/testsuite/g++.dg/cdce3.C test failures. Taking Ira's vectoriser-only fix-pr51301-4.6 branch and comparing it with it's predecessor r106845: * cdce3.o itself is identical across compilers

Re: cdce3.C execution fault

2011-12-21 Thread Richard Sandiford
Michael Hope michael.h...@linaro.org writes: On Tue, Dec 20, 2011 at 10:00 PM, Richard Sandiford richard.sandif...@linaro.org wrote: Michael Hope michael.h...@linaro.org writes: Hi there.  I've looked further into the intermittent gcc/testsuite/g++.dg/cdce3.C test failures.  Taking Ira's

Patch drop: Rework MEM rtx_costs

2011-12-29 Thread Richard Sandiford
I originally wrote this patch as part of the auto-inc-dec work. I didn't submit it because I wasn't sure what value of extra_writeback_latency was appropriate for A9. (I was hoping to crib it from Ramana's pipeline description.) The patch introduces three new fields to the costs structure: one

Patch drop: fwprop.c patch for neon-strided-load-extract

2011-12-29 Thread Richard Sandiford
The remaining change for neon-strided-load-extract is to allow fwprop.c to propagate: (set (reg X) (subreg (reg Y) N)) even if no further simplifications are possible. I posted the original patch for comments here: http://article.gmane.org/gmane.comp.gcc.patches/246180/ and fixed the

Update to lp:~rsandifo/+junk/loop-microbenchmarks

2011-12-30 Thread Richard Sandiford
About three months ago, 4.7 stopped being able to optimise things like: int *__restrict x = ...; The (libav) loop microbenchmarks that I'd written used this construct a lot, as an easy way of automatically generating a whole function from a loop kernel. I spent a while testing 4.7 with the

Re: Patch drop: fwprop.c patch for neon-strided-load-extract

2012-01-04 Thread Richard Sandiford
Ramana Radhakrishnan ramana.radhakrish...@linaro.org writes: On 29 December 2011 10:21, Richard Sandiford richard.sandif...@linaro.org wrote: The remaining change for neon-strided-load-extract is to allow fwprop.c to propagate:    (set (reg X) (subreg (reg Y) N)) even if no further

Re: [hpc-sig-devel] GCC extensions for `hcqc'

2017-08-29 Thread Richard Sandiford
Sorry for the delayed response. Masaki Arai writes: > Hi, > > Thank you very much for your quick check and reply. > > Kugan Vivekanandarajah writes: >> > I looked into the structure, adding this field is not going to make the > s= >>

Re: SVE routines for cortex-strings

2018-06-01 Thread Richard Sandiford
Richard Henderson writes: > I spoke with Ramana about these at HKG18, and I'm finally getting back to > these. I have routines for > > -rw-rw-r--. 1 rth rth 2538 May 30 19:12 memchr.S > -rw-rw-r--. 1 rth rth 2405 May 30 20:49 memcmp.S > -rw-rw-r--. 1 rth rth 2385 May 30 19:12 rawmemchr.S >

Re: [Linaro-TCWG-CI] gcc-14-9157-gff442719cdb: slowed down by 23% - 549.fotonik3d_r on aarch64 O3

2024-03-26 Thread Richard Sandiford
the problem and/or when you have >> a fix. >> >> In CI config tcwg_bmk-code_speed-cpu2017rate/gnu-aarch64-master-O3 after: >> >> | commit gcc-14-9157-gff442719cdb >> | Author: Richard Sandiford >> | Date: Fri Feb 23 14:12:55 2024 + >> | >>