Thanks John,
I now know a bit more about this. It seems the 128bit support in windows and 
linux is done differently. I am told optimized routines inside the c library 
are used on linux but on windows support for divide is provided by compiler-rt 
and the code is very old and unoptimized.
That given what you tell me I think my way forward is to try and get a better 
version of the divide routines working.
Neill.

-----Original Message-----
From: Boost-users <boost-users-boun...@lists.boost.org> On Behalf Of John 
Maddock via Boost-users
Sent: Friday, August 20, 2021 9:43 AM
To: Neill Clift via Boost-users <boost-users@lists.boost.org>
Cc: John Maddock <jz.madd...@googlemail.com>
Subject: Re: [Boost-users] [multiprecission] Large divides layered on 128 bit 
divides

On 19/08/2021 23:02, Neill Clift via Boost-users wrote:
>
> Hi,
>
> The architecture of cpp_int being build on 64 bit arithmetic using 128 
> bit double_limb_type is interesting.
>
> I have a question on the large divide (divide_unsigned_helper). It 
> uses the upper portions of the large integers to get an estimation of 
> the quotient. Subtracts out a multiple of that quotient and repeats.
>
> It does this in 128 bit values if available from the compiler:
>
>          double_limb_type a =
> (static_cast<double_limb_type>(prem[r_order]) << CppInt1::limb_bits) | 
> prem[r_order - 1];
>
>          double_limb_type b = py[y_order];
>
>          double_limb_type v = a / b;
>
> The compiler emulates this operation in the routine __udivmodti4 which 
> itself uses an iterative approach.
>
> It seems to use a pretty basic shift and subtract algorithm mind you.
>
> As a general rule for multiprecision is it OK to layer the Knuth like 
> algorithm D on top of each other this way.
>
> I have no idea myself but wonder if this is a known issue. I would 
> have guessed that it made sense to do a 64 by 64 bit divide to guess 
> the quotient and repeat.
>
Good question.

As I recall I tried both single-limb and double-limb partial-quotients and the 
double-limb (128 bit) version was slightly faster.

There is a balance here between removing as large a chunk as you can with each 
loop, compared to more expensive operations within the loop. You could for 
example, perform "Karatsuba-like" division by splitting a B-bit numerator into 
two B/2 chunks and perform schoolboy division on the two "digit" numbers.  But 
the fact that no-one seems to have done this suggests how well it must work ;)  
On the other hand, __int128, while a synthetic type, is sufficiently well 
optimised for this to be a useful chunk size.

HTH, John.


--
This email has been checked for viruses by Avast antivirus software.
https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.avast.com%2Fantivirus&amp;data=04%7C01%7C%7Ca715f27c760f4cd99de008d963f9de25%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637650747094109344%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=McX5CDu%2B1%2FEK7vSQ0y1Xoofq90wvebkry3YC%2FFaD1WY%3D&amp;reserved=0

_______________________________________________
Boost-users mailing list
Boost-users@lists.boost.org
https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.boost.org%2Fmailman%2Flistinfo.cgi%2Fboost-users&amp;data=04%7C01%7C%7Ca715f27c760f4cd99de008d963f9de25%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637650747094109344%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=fqn6HAcLF5mlIpp%2FoEzohPe4G6N8CpFnTf8FyNFQxEE%3D&amp;reserved=0
_______________________________________________
Boost-users mailing list
Boost-users@lists.boost.org
https://lists.boost.org/mailman/listinfo.cgi/boost-users

Reply via email to