Mark Mitchell m...@codesourcery.com writes:
On 11/8/2010 7:22 AM, Yao Qi wrote:
In this situation, this LP GCC 4.6 branch can be regarded as our
upstreams at that moment.
* Try to get upstream approval for all new patches in the usual way
* on the understanding that they won't be
Ira Rosen ira.ro...@linaro.org writes:
On 9 November 2010 14:38, Andrew Stubbs [1]...@codesourcery.com wrote:
Re my recent email Upstream GCC feature freeze, I think we're agreed
that we need to create a branch that tracks GCC 4.6 development, but has
our own performance improvements
== This week ==
Started looking at STT_GNU_IFUNC support in BFD. There were a couple
of janitorial changes I needed to make in order to prepare elf32-arm.c
for the main patch. I tested those separately and submitted them upstream:
http://sourceware.org/ml/binutils/2010-11/msg00330.html
For the record, the thing I half-remembered on the call was:
http://gcc.gnu.org/ml/gcc-patches/2009-08/msg00697.html
and:
http://gcc.gnu.org/ml/gcc-patches/2009-09/msg02112.html
The problem is that all __sync operations besides __sync_lock_test_and_set
and __sync_lock_release are defined
== This week ==
* Away Monday, and a fair bit of time on non-Linaro duties.
* Looked at Dave's gromacs bug (693502). Turned out to be a reload
inheritance problem. Tested a patch. Spent some time coming up with
a brute-force testcase that I can submit with the patch.
* Found a bug in the
== Last week ==
* Backported the fixes for lp693502, lp710623 and lp710652 to linaro 4.6
and linaro 4.5. Tested and sent merge requests.
* Wrote several more ifunc tests, and fixed the bugs they showed up.
Found that ARM generates unnecessary dynamic relocs against GOT entries,
so fixed
Hi Lee,
Thanks for the reply.
Lee Smith lee.sm...@arm.com writes:
To the extent that I do understand your request I think you are asking
for one new _dynamic relocation_ code to be allocated. (A dynamic
relocation is one that is usually performed by a dynamic linker
(because, for example,
== This week ==
* Got the STT_GNU_IFUNC work ready to submit. Split out some preparatory
patches, including fixes for some general ARM inefficiencies that I
noticed this week. Ran the EGLIBC testsuite (including ifunc tests)
and they passed.
* Discussed ideas for representing permuted
Julian Brown jul...@codesourcery.com writes:
Richard Sandiford richard.sandif...@linaro.org wrote:
One of the vectorisation discussions from last year was about the poor
code GCC generates for vld{2,3,4}_*() and vst{2,3,4}_*(). It forces
the result of the loads onto the stack, then loads
Julian Brown jul...@codesourcery.com writes:
2. Builtins (__builtin_neon_*) which previously used big integer
modes to pass/return values, are initialised such that they
directly pass/return the struct types above instead. The intrinsic
wrappers in arm_neon.h no longer need to use unions
I've been spending this week playing around with various representations
of the v{ld,st}{1,2,3,4}{,_lane} operations. I agree with Ira that the
best representation would be to use built-in functions.
One concern in the original discussion was that the optimisers might
move the original MEM_REFs
Richard Sandiford richard.sandif...@linaro.org writes:
__builtin_store_lanes (VECTORS : array N of vector M of X)
returns array N*M of X
maps to vstN
in practice, the argument would be populated by assignments of the form:
vectorX = ARRAY_REF result, X
er
Loïc Minier loic.min...@linaro.org writes:
I'm trying to extend the *link: specs to pass a different
-dynamic-linker depending on the float ABI. But I didn't manage to
build a construct which would preserve the order of the flags; if I do
something like:
%{msoft-float:-dynamic-linker
== Last week ==
* Committed STT_GNU_IFUNC changes to binutils.
* Submitted the STT_GNU_IFUNC changes to GLIBC ports. Got feedback
on Friday, which I'll deal with this week.
* Worked on the expand and rtl-level parts of the load/store lane
representation, with new optabs for each operation.
== This week ==
* Moved the discussion about the RTL and gimple representation of
strided loads/stores to the gcc@ list. Got some good feedback:
http://gcc.gnu.org/ml/gcc/2011-03/msg00322.html
* Started a subdiscussion about the handling of modes:
Michael Hope michael.h...@linaro.org writes:
For reference. We know that the NEON intrinsics in GCC have issues.
I came across this page:
http://hilbert-space.de/?p=22
which has a colour to greyscale conversion done using intrinsics.
gcc-linaro-4.5-2011.03-0 does poorly through saving
== Last week ==
* Finished the patch that I was working on last week to use memory operands
rather than register operands in neon.md. Submitted upstream:
http://gcc.gnu.org/ml/gcc-patches/2011-03/msg01996.html
Among other things, this allows the intrinsics to use post-modified
== Last week ==
* Sent a fix for PR target/46329 upstream.
* Discussed with Richard Guenther how to represent the interleaved
load/store functions that we're adding to gimple. Sent a patch
upstream for comments. Richard confirmed on IRC that he was happy
with it, and no-one else has
Michael Hope michael.h...@linaro.org writes:
Hi there. Mounir and I have been looking at the work for next cycle.
A summary spreadsheet with notes is available here:
https://spreadsheets0.google.com/ccc?key=ty1c-H56f0GxnL1Hk9LCmRg
I'm very interested in feedback, especially on the time
I've now submitted the initial vldN and vstN work, so I thought I'd see
how often it triggers for natty's libav package. I've put some initial
results here:
https://wiki.linaro.org/RichardSandiford/Sandbox/NeonLibAv
There are more files to go through, so this isn't complete.
I've also left
== This week ==
* Worked on a fix for https://bugs.launchpad.net/gcc-linaro/+bug/758082
Submitted the patch upstream.
* Finished first cut of vldN and vstN vectorisation. Send the patches
upstream. Most of the patches have been approved, but I'll wait for
the others before committing.
*
which has been applied to 4.7. No changes were needed for 4.5.
Richard
gcc/
Backport from mainline:
2011-03-30 Richard Sandiford richard.sandif...@linaro.org
Ramana Radhakrishnan ramana.radhakrish...@linaro.org
PR target/43590
* config/arm
This patch allows the target to override MAX_FIXED_MODE_SIZE for
specific kinds of array. We can then give a non-BLK mode to things
like uint32x2x4_t, which in turn allows them to be stored in registers.
The patch is a backport of:
http://gcc.gnu.org/ml/gcc-patches/2011-03/msg02192.html
This patch converts LEGITIMATE_CONSTANT_P into a target hook and
passes along the mode of the constant. This can then be used by 5/5.
The patch is a version of:
http://gcc.gnu.org/ml/gcc-patches/2011-04/msg00195.html
which is still pending review after two pings. It seems pretty simple
This patch handles moves involving structure constants. It's a backport of:
http://gcc.gnu.org/ml/gcc-patches/2011-04/msg00200.html
which Richard Earnshaw has approved, but which cannot be applied yet
because it depends on 4/5. The patch is needed because 3/5 would
otherwise expose new
Michael mentioned that some users reported seeing better preformance from
RVCT using arm_neon.h then they did when coding directly in assembler.
He suggested we try the same thing for GCC. Here's an experiment using
the example that Jim Huang posted to the dev list recently:
== This week ==
* Iterated with upstream on some of the vectorisation patches. I think
only half a patch (the ARM implementation of array_mode_supported_p)
is still pending review; everything else has been approved.
* Backported the vldN and vstN intrinsics to Linaro 4.5.
* Finished off
Andrew Stubbs a...@codesourcery.com writes:
On 05/05/11 08:43, Richard Sandiford wrote:
Anyway, the bzr help page seemed to suggest that merging in the new
4.6 revision was the Right Thing to do. I'm afraid that, once again,
it felt so natural to resolve push conflicts this way that I didn't
Last week, Ramana pointed me at an upstream bug report about the
inefficient code that GCC generates for vzip, vuzp and vtrn:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48941
It was filed not longer after the Neon seminar at the summit;
I'm not sure whether that was a coincidence or not.
I
I've added some ideas to the NEON blueprint. There are now really 6
separate tasks, broken down into subitems, so it looks like we really
could have 6 separate blueprints, as you mentioned on the wiki page.
I wasn't sure how to create those blueprints correctly though.
Please let me know if they
== This week ==
* Spent almost all the week on GCC's auto inc/dec pass. I first
continued with the incremental clean ups and recoding that I'd
started during free time at Budapest, with the idea of bolting the new
optimisations on top of that. However, in the end, I decided it would
be
== This week ==
* Catching up on email.
* More experiementation with the auto inc/dec stuff. TBH, this has taken
longer than expected, but I think it's close now.
* Wrote a dejagnu testcase for PR 49196. Tested it on trunk and submitted
it upstream.
== Next week ==
* Backport fix for PR
== This week ==
* Fixed the unnecessary union initialisers that were causing ICEs
with -g. This turned out to be a lot more work than Richard's
one-liner suggested. :-)
* Backported Chung-Lin's arm_legitimize_reload_address patch to 4.5.
* Backported the smallest_mode_for_size patch to 4.5
== This week ==
* Wrote a fix for 809768. Accepted upstream.
* Looked at upstream PR 49742 (the failures seen with predictive commoning).
Accepted upstream.
* More shrink-wrap review.
* Sent auto-inc-dec changes out for comments. Got some good private
feedback (in the sense of being
I've updated:
https://wiki.linaro.org/RichardSandiford/Sandbox/NeonLibAv
so that it gives the output for current trunk, including Ira's commit
yesterday to reduce the amount of overpromotion. I also reran the
microbenchmarks. The good news is that the vectorised code is now
better in all
== Last week (Linaro Connect) ==
* Reran libav comparisons after Ira's fix for excessive promotion.
The vectorized versions are now at least as good as the non-vectorised
ones. Updated wiki page with new asm output and microbenchmark results.
* More work on SMS. I have some patches that
Dave Martin dave.mar...@linaro.org writes:
However, there's not really anything fundamentally
architecture-specific about this problem, and ideally the solution and
the directives should not be architecture-specific either.
One option which appeals to me is to have some directives which can
== This week ==
* Looked a bug report that the fix for LP #736007 had caused regressions
on powerpc-darwin. It turned out to be a target-specific bug; the
backend has the same const_vector code as i386 and spu, but the fix for
PR34856 was never applied there. I'll submit the patch (and
Michael Hope michael.h...@linaro.org writes:
I put a build harness around libav and gathered some profiling data. See:
bzr branch lp:~linaro-toolchain-dev/+junk/libav-suite
It includes a Makefile that builds a C only, h.264 only decoder and
two Creative Commons licensed videos to use as
Michael Hope michael.h...@linaro.org writes:
On Tue, Aug 16, 2011 at 11:32 PM, Richard Sandiford
richard.sandif...@linaro.org wrote:
Michael Hope michael.h...@linaro.org writes:
I put a build harness around libav and gathered some profiling data. See:
bzr branch lp:~linaro-toolchain-dev
Following on from yesterday's call about what it would take to enable
SMS by default: one of the problems I was seeing with the SMS+IV patch
was that we ended up with excessive moves. E.g. a loop such as:
void
foo (int *__restrict a, int n)
{
int i;
for (i = 0; i n; i
Revital Eres revital.e...@linaro.org writes:
btw, do you also have numbers of how much SMS (hopefully) improves
performance on top of the vectorized code?
OK, here's a comparison of:
-mcpu=cortex-a8 -mfpu=neon -mfloat-abi=softfp -mvectorize-with-neon-quad
-fno-auto-inc-dec
vs:
Richard Sandiford richard.sandif...@linaro.org writes:
Revital Eres revital.e...@linaro.org writes:
btw, do you also have numbers of how much SMS (hopefully) improves
performance on top of the vectorized code?
OK, here's a comparison of:
-mcpu=cortex-a8 -mfpu=neon -mfloat-abi=softfp
Richard Sandiford richard.sandif...@linaro.org writes:
Richard Sandiford richard.sandif...@linaro.org writes:
Revital Eres revital.e...@linaro.org writes:
btw, do you also have numbers of how much SMS (hopefully) improves
performance on top of the vectorized code?
OK, here's a comparison
== This week ==
* Wrote some patches to make SMS schedule register moves. They made a
significant difference to some libav loops. I'm running a regression
test on pwoerpc-ibm-aix5.3.0 and will submit upstream next week if
all goes OK.
* Looked at why mjpegenc was so much worse with SMS.
I've tried to clean up the libav microbenchmarks that I did for the strided
load/store stuff. They're on Launchpad at:
lp:~rsandifo/+junk/loop-microbenchmarks
The main changes are that the benchmarks now preload the caches (for CPUs
that don't allocate on write) and that they now check the
== This week ==
* Looked at the get_arm_condition_code ICE. Seems to be a popular bug:
was reported as #589887 #823708 and #809761 in Lauchpad and as PR49030
in bugzilla. Sent a patch upstream.
* Submitted SMS register-dependency patch upstream.
* Reviewed Bernd's new shrink-wrap patch.
Michael Hope michael.h...@linaro.org writes:
While out benchmarking today, I ran across code similar to this:
int *a;
int *b;
int *c;
const int ad[320];
const int bd[320];
const int cd[320];
void fill()
{
for (int i = 0; i 320; i++)
{
a[i] = ad[i];
b[i] = bd[i];
Just as an FYI, I've added these loops to the libav microbenchmarks
avg-h264-chroma-mc8-8.txt
avg-pixels8-8.txt
ff-h264-idct-add-8-8.txt
ff-put-pixels8x16-8.txt
h264-loop-filter-luma-8.txt
idct-internal-8.txt
put-h264-chroma-mc8-8.txt
put-h264-qpel8-h-lowpass-8.txt
Very nonauthorotative answer, but...
Asa Sandahl asa.sand...@linaro.org writes:
When building Android with the Linaro toolchain, I encountered this link
time error when going from gcc 4.4.3 to gcc 4.6.
arm-eabi-g++: error: unrecognized option '-avoid-version'
I find several posts about
== This week ==
* Submitted a fix for the performance regression caused by my
arm_comparison_operator patch. Applied upstream after approval
from Ramana (thanks). Will backport to Linaro towards the end
of next week if there are no reported problems.
* Went back to looking at
== Last week ==
* Patch review.
* Backported second attempt to fix get_arm_condition_code ICE.
* Worked on -fsched-pressure. Experimented with various combinations
of ideas. This is giving some good results (e.g. a 2x improvement
in libav's put_h264_qpel8_hv_lowpass_8) but needs a bit
Michael Hope michael.h...@linaro.org writes:
limits-fndefn.c takes an impressively long time to run. On an idle
machine, -O3 -g -c takes 17:31 and -O2 -g -c takes The test already
has a dg-timeout-factor of 4 giving a total timeout of 20 minutes.
Removing the -g brings this down to 30 s.
== Last week and today ==
* Backported fix for returning std::pairbool, bool. Unfortunately
this showed up a regression on 4.5. I couldn't reproduce it cross,
and the testcase itself looks innocuous, so I'm wondering whether
the patch might trigger a miscompilation of cc1plus.
*
Revital Eres revital.e...@linaro.org writes:
Another issue is related to the regression I saw with SMS in libav's
dsputil-ssd_int8_vs_int16_c.
Consulting with Ayal regarding this it seemed that the
regression was due to dependence between accumulations that can be
avoided, more specifically
== This week ==
* More on -fsched-pressure. Testing on POWER7 showed a degenerate case
that I'd failed to handle well. Fixed that. Saw that part of the
problem on POWER7 was that IRA was using a combination of GENERAL_REGS
and CR_REGS as a single pressure class, so there appeared to be
Dave Martin dave.mar...@linaro.org writes:
Another way of doing a similar thing is to mark __mylib_constructor
as undefined in all the objects that make up the library.
Unfortunately, there seems to be no obvious way of doing that: the
assembler generates undefined symbol references
Richard Sandiford richard.sandif...@linaro.org writes:
Dave Martin dave.mar...@linaro.org writes:
Another way of doing a similar thing is to mark __mylib_constructor
as undefined in all the objects that make up the library.
Unfortunately, there seems to be no obvious way of doing
== This week ==
* Got the -fsched-pressure code into a state where it's almost
presentable. Found a few more things to tweak on the way.
Fixed some FIXMEs, notably to honour MAX_SCHED_READY_INSNS.
* More testing on ARM. Tried to get some SPEC2000 results
as well as the usual EEMBC
Michael Hope michael.h...@linaro.org writes:
Hi there. I've looked further into the intermittent
gcc/testsuite/g++.dg/cdce3.C test failures. Taking Ira's
vectoriser-only fix-pr51301-4.6 branch and comparing it with it's
predecessor r106845:
* cdce3.o itself is identical across compilers
Michael Hope michael.h...@linaro.org writes:
On Tue, Dec 20, 2011 at 10:00 PM, Richard Sandiford
richard.sandif...@linaro.org wrote:
Michael Hope michael.h...@linaro.org writes:
Hi there. I've looked further into the intermittent
gcc/testsuite/g++.dg/cdce3.C test failures. Taking Ira's
I originally wrote this patch as part of the auto-inc-dec work. I didn't
submit it because I wasn't sure what value of extra_writeback_latency
was appropriate for A9. (I was hoping to crib it from Ramana's pipeline
description.)
The patch introduces three new fields to the costs structure: one
The remaining change for neon-strided-load-extract is to allow fwprop.c
to propagate:
(set (reg X) (subreg (reg Y) N))
even if no further simplifications are possible. I posted the original
patch for comments here:
http://article.gmane.org/gmane.comp.gcc.patches/246180/
and fixed the
About three months ago, 4.7 stopped being able to optimise things like:
int *__restrict x = ...;
The (libav) loop microbenchmarks that I'd written used this construct
a lot, as an easy way of automatically generating a whole function
from a loop kernel.
I spent a while testing 4.7 with the
Ramana Radhakrishnan ramana.radhakrish...@linaro.org writes:
On 29 December 2011 10:21, Richard Sandiford
richard.sandif...@linaro.org wrote:
The remaining change for neon-strided-load-extract is to allow fwprop.c
to propagate:
(set (reg X) (subreg (reg Y) N))
even if no further
Sorry for the delayed response.
Masaki Arai writes:
> Hi,
>
> Thank you very much for your quick check and reply.
>
> Kugan Vivekanandarajah writes:
>> > I looked into the structure, adding this field is not going to make the
> s=
>>
Richard Henderson writes:
> I spoke with Ramana about these at HKG18, and I'm finally getting back to
> these. I have routines for
>
> -rw-rw-r--. 1 rth rth 2538 May 30 19:12 memchr.S
> -rw-rw-r--. 1 rth rth 2405 May 30 20:49 memcmp.S
> -rw-rw-r--. 1 rth rth 2385 May 30 19:12 rawmemchr.S
>
the problem and/or when you have
>> a fix.
>>
>> In CI config tcwg_bmk-code_speed-cpu2017rate/gnu-aarch64-master-O3 after:
>>
>> | commit gcc-14-9157-gff442719cdb
>> | Author: Richard Sandiford
>> | Date: Fri Feb 23 14:12:55 2024 +
>> |
>>
68 matches
Mail list logo