The file which is faulty or misassembled is /mpn/x86/k7/diveby3.asm.

As this is a recent assembly file which we've added, it could well be
faulty. We could probably verify this using try on a 32 bit machine on
SkyNet, though we don't have any 32 bit AMD machines on there.

For now, I've disabled this file in my MPIR-tcc repo. Everything seems
to work now, though of course I have not got any test code operating
yet.

2000000! takes a little under 5 times longer to compute on my 32 bit
Windows machine than my 64 bit Linux server. It's not too surprising
that this machine is quite slow in the FFT range. It will be
interesting to get FLINT and the FLINT FFT working on this machine and
see if it is any better.

A fairer comparison would be using 64 bit native linux on this same
machine with gcc as the compiler. I can possibly try that later. I
haven't got 32 bit linux installed on this machine unfortunately. It
might be worth installing just to get a fair comparison between linux
+ gcc 32 bits and Windows + TCC 32 bits.

Bill.

2009/11/30 Bill Hart <[email protected]>:
> The bug doesn't seem to be in the FFT as such, but in the assembly code.
>
> I decided to check that the same data was being passed to the FFT each
> time, as it was randomly failing. The data was not consistent beyond
> about the first 10 limbs.
>
> I removed in turn the assembly files in /mpn/x86/k7/mmx/k8,
> /mpn/x86/k7/mmx and then /mpn/x86/k7. With just the generic assembly
> code in the /mpn/x86 directory, it works fine.
>
> Thus one of the files in /mpn/x86/k7 must be faulty or misassembled.
> One other possibility is that turning off one of the HAVE_NATIVE flags
> causes a different code pathway to be enacted.
>
> I'll put the assembly files back in one at a time and see what
> happens. They are:
>
> add_n, sub_n, addmul_1, submul_1, dive_1, diveby3, gcd_1, mod_34lsub1,
> mode1o, mul_1, mul_basecase, sqr_basecase.
>
> Given that these should have been tested very thoroughly in the past,
> the most likely explanation is not that the assembly code itself is at
> fault, but that I introduced a fault when I split the multifunction
> files or disabled the appropriate HAVE_NATIVE flags. We'll soon know.
>
> Bill.
>
> 2009/11/30 Bill Hart <[email protected]>:
>> Sorry that's a 1.9GHz mobile K8 with Windows 32 bits. So the
>> comparison is even better.
>>
>> Bill.
>>
>> 2009/11/30 Bill Hart <[email protected]>:
>>> I can't time anything in the FFT region until I fix the FFT. But here
>>> is a single point of reference for timing.
>>>
>>> Computing the factorial of 100000 is approximately 3 times slower on
>>> my 32 bit 2.4GHz K8 Windows machine than on my 64 bit 2.4GHz Opteron
>>> Linux server.
>>>
>>> That seems like fairly good performance given it is 32 bits vs 64 bits.
>>>
>>> Bill.
>>>
>>> 2009/11/30 Bill Hart <[email protected]>:
>>>> I now have TCC producing an mpir.dll and associated definition file.
>>>>
>>>> The only problems I had in the end was that it gave warnings about all
>>>> the duplicate loop labels in the assembly files and it didn't like the
>>>> assembly code for add_ssaaaa and sub_ddmmss in longlong.h (which I
>>>> just commented out so it would use the C fallbacks).
>>>>
>>>> It seems to crash when dealing with anything above about 4 limbs, but
>>>> that might be to do with the duplicate labels.
>>>>
>>>> Bill.
>>>>
>>>> 2009/11/29 Bill Hart <[email protected]>:
>>>>> That's amazing. All the files necessary to build the MPIR library now
>>>>> build. It takes 23s in total on a single core!
>>>>>
>>>>> Bill.
>>>>>
>>>>> 2009/11/29 Bill Hart <[email protected]>:
>>>>>> I've got a very basic configure and makefile working for MPIR using
>>>>>> tcc on 32 bit Windows which assembles all the k8 assembly files and
>>>>>> all the generic C mpn files.
>>>>>>
>>>>>> If you want to clone the project:
>>>>>>
>>>>>> git clone http://selmer.warwick.ac.uk/MPIR-tcc.git MPIR-tcc
>>>>>>
>>>>>> Instructions on how to build the project are in README.
>>>>>>
>>>>>> So far, unless it detects your CPU as a k8, it will fail. If you don't
>>>>>> have a k8, duplicate the following section in configure for your CPU
>>>>>> type:
>>>>>>
>>>>>> k8)
>>>>>>   mpn_dirs="mpn/x86 mpn/x86/k7 mpn/x86/k7/mmx"
>>>>>> ;;
>>>>>>
>>>>>> adjusting the paths correctly.
>>>>>>
>>>>>> No dll is produced yet, only object files. But it takes 10s to run
>>>>>> configure and another 10s to assemble and compile all the relevant
>>>>>> .asm/.c files on 32 bit Windows!
>>>>>>
>>>>>> If you want to clean up, just type:
>>>>>>
>>>>>> make clean
>>>>>>
>>>>>> None of the other build targets work yet.
>>>>>>
>>>>>> I've not tried to build on Linux, but note it is only going to work on
>>>>>> a 32 bit linux box, if at all.
>>>>>>
>>>>>> Bill.
>>>>>>
>>>>>> 2009/11/29 Bill Hart <[email protected]>:
>>>>>>> 2009/11/29 Cactus <[email protected]>:
>>>>>>>>
>>>>>>>>
>>>>>>>> On Nov 29, 2:49 am, Bill Hart <[email protected]> wrote:
>>>>>>>>> I've just been looking at the TCC compiler.
>>>>>>>>>
>>>>>>>>> http://bellard.org/tcc/
>>>>>>>>>
>>>>>>>>> Advantages:
>>>>>>>>> =========
>>>>>>>>>
>>>>>>>>> - Cross platform - works on Windows and Linux
>>>>>>>>> - Almost C99 compliant
>>>>>>>>> - Supports GNU inline asm
>>>>>>>>> - Compiles GNU .asm files
>>>>>>>>> - Compiles and links unbelievably quickly, even on Windows
>>>>>>>>> - Very small comprehensible codebase
>>>>>>>>> - LGPL v2+
>>>>>>>>> - produces native Windows binaries
>>>>>>>>>
>>>>>>>>> Disadvantages:
>>>>>>>>> ===========
>>>>>>>>>
>>>>>>>>> - Doesn't support SSE asm instructions (probably wouldn't be hard to
>>>>>>>>> add support for these - the codebase is quite comprehensible)
>>>>>>>>> - 32 bit x86 assembly only (the latest version supports "x86_64
>>>>>>>>> targets", but I am not sure what this means)
>>>>>>>>> - probably doesn't optimise as well as gcc (though I did some basic
>>>>>>>>> loop timings and they were fine)
>>>>>>>>>
>>>>>>>>> Well I just had a play, and it assembled almost all the k8 .asm files
>>>>>>>>> in MPIR and almost all of the mpn .c files. The exceptions were the
>>>>>>>>> multifunction files, due to the fact that a couple of defines are
>>>>>>>>> missing (easily fixed and my fault) and perfsqr.c (perfsqr.h is
>>>>>>>>> missing - also not the fault of tcc). It takes about 6s total to
>>>>>>>>> assemble and compile all that stuff! That's faster than a 16 core
>>>>>>>>> parallel build on Selmer!!!!!!!!!!!!!!!!
>>>>>>>>>
>>>>>>>>> There also seems to be some issue with alloca.h which I needed to work
>>>>>>>>> around, as I know nothing about alloca.h.
>>>>>>>>>
>>>>>>>>> I'm actually really keen to build MPIR with TCC because I can also use
>>>>>>>>> TCC to build FLINT on Windows. I checked and the longlong.h I use for
>>>>>>>>> FLINT compiles fine with tcc. The only issue I can find with using it
>>>>>>>>> to compile FLINT is that for (unsigned long i = 0; i < count; i++)
>>>>>>>>> doesn't compile. It expects unsigned long i; for (i = 0; i < count;
>>>>>>>>> i++). However a very simple script could easily fix this for all files
>>>>>>>>> in FLINT. I'm sure this could also be easily fixed in TCC itself as
>>>>>>>>> they are moving towards full c99 support and quite a few gnu
>>>>>>>>> extensions.
>>>>>>>>>
>>>>>>>>> There seem to be some issues with tcc development stalling, but it
>>>>>>>>> isn't a dead project. The last release was May 20th.
>>>>>>>>>
>>>>>>>>> I'm kind of confused about one thing. It looks to me that it supports
>>>>>>>>> linux calling conventions. This is great if true, but maybe the
>>>>>>>>> calling conventions don't differ on x86 32?
>>>>>>>>
>>>>>>>> This is easy on x86 since there are very few differences in the
>>>>>>>> calling conventions.
>>>>>>>
>>>>>>> That explains a few things. I recall for example that the 32 bit
>>>>>>> Windows assembly code works just fine on 32 bit Windows using MinGW.
>>>>>>>
>>>>>>> I wonder how 64 bit MinGW works, whether it uses linux or Windows
>>>>>>> calling conventions.
>>>>>>>
>>>>>>> The documentation with TCC is not great, so I couldn't say what they
>>>>>>> do for their x86_64 targets.
>>>>>>>
>>>>>>>>
>>>>>>>> I think it should be possible to use Linux calling conventions on
>>>>>>>> Windows x64 as well if a compiler makes use of special libraries that
>>>>>>>> handle the the differences in calling conventions before interfacing
>>>>>>>> with the Windows standard libraries and interfaces.  But I might haave
>>>>>>>> missed something that prevents this.
>>>>>>>>
>>>>>>>
>>>>>>> Yes, I guess there'd need to be some kind of wrapper around each of
>>>>>>> the Windows standard functions. Callback functions would be tricky to
>>>>>>> handle. But I suppose it would be possible for the wrapper to
>>>>>>> automatically wrap such functions before handing them to Windows.
>>>>>>> Performance might suffer a bit, though most Windows standard library
>>>>>>> functions are probably fairly hefty in the first place.
>>>>>>>
>>>>>>> Anyhow, time to make this MPIR-tcc git repo. I doubt it will be a
>>>>>>> terribly credible alternative to an MSVC version of Windows, but it
>>>>>>> will have a simple non-autotools build system, it will compile
>>>>>>> extremely fast on Windows and there are the other advantages I
>>>>>>> mentioned. It could be useful for some users.
>>>>>>>
>>>>>>> Bill.
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

--

You received this message because you are subscribed to the Google Groups 
"mpir-devel" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/mpir-devel?hl=en.


Reply via email to