Mon May 28 04:01:37 2012: Request 76368 was acted upon.
Transaction: Correspondence added by bulk88.
       Queue: Win32
     Subject: rewrote Win32
   Broken in: 0.44
    Severity: Wishlist
       Owner: Nobody
  Requestors: bul...@hotmail.com
      Status: open
 Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=76368 >


On Thu May 24 18:12:51 2012, j...@activestate.com wrote:
> On Sat, 07 Apr 2012, patcat via RT wrote:
> 
> Sorry for the long delay; I've been traveling for a month and am just
> catching up with some things.
>  
> >  Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=76368 >
> > 
> > The Win32 library's XS code looked very bloated to me. So I rewrote it.
> 
> Thank you for your effort! I would like to split up your big patch into
> several smaller ones, each with just a single purpose. Ideally you
> would create a serious of commits on github and just issue a pull
> request against:
> 
> https://github.com/jandubois/win32
Egh, I dont really know how to use revision control software. Your lucky
I figured out how to use the diff tool and didn't attach a zip to this
RT jkjk. *I* would rather not try to split up the patch I posted into
separate files. Address it as a whole. The patch file as a whole can be
changed. Its a rewrite, or a fork really. It would be alot of work for
me to split it up and a couple of the bloat reducing techniques are
intertwined in a way if they were split into different patches, the
intermediate versions wouldn't compile. Some of the code I put into this
rewrite was specifically for others to reuse in their XS modules. The
bug fixes are also in the ALIASed xsubs so they would need to be
backported to the non-ALIASed versions.

If you didn't respond, I would've eventually posted my rewrite on CPAN
under Win32::Slim/Fast/Small or something similar. My reasoning behind
the rewrite was, this module is loaded on practically every Win32 perl
process in the world, its a worthy target to make the best/smallest
possible. I didn't touch the Perl side of it except to rewrite the POD
for time reasons. Someone could argue some of those constants and
EXPORT_OK should've been created in the C code. I also did NOT test this
on cygwin perl, and I didn't try to build a perl interp from scratch
with this new Win32 mod since I think Win32 mod is what is called a dual
life. I dont think it will fail during a perl interp compile since no
modules other are than EUMM are used during the build.

> 
> However, for some parts of your patch I need some convincing that they
> are correct/necessary:
> 
> +0.45 [2012-??-??]
> +     - massive refactoring, to reduce code bloat, DLL is 24KB vs 33KB in 
> +       0.44, using 32bit Perl 5.10 on VC 2003, with default -01
> 
> I'm not concerned about "bloat" when the total size is just 33KB.
> Reducing the size is fine, but should not be done at the cost of making
> the code harder to maintain. Which part of your patch is responsible for
> the size reduction?
90% of it.
Size reduction techniques used.

-CRT removal
-PERL_CORE (read more later in the post)
-Heavy use of ALIAS: keyword (or the Perl C equivalent, Win32 mod
doesn't really use XS)
-switches statements and grouping Win32 subs into single C func based on
common prototypes in Perl and Perl to C parameter conversion
-Heavy use of gotos, -O1 VC 2003 almost never combines identical branch
blocks with jumps
-Register optimization/liveness reduction/reducing stack variables
through reorganization of calls to the Perl API
-No XS_RETURN macros, ever, they cause extremely duplicated machine
code. VS 2003 atleast in C mode doesn't realize its the same block of
code and makes a jump code. non-X PUSH and PUTBACK are the slim-est way
to return parameters in XS.
-I tried a kindda crazy way (not MS and probably not GCC documented) of
getting 2 duplicate DLL names out of the DLL by getting pointers to the
import table strings of them. Preprocessor it out if your not
comfortable with it.
-saving SV *s in C autos, macro expansion like SvPV_nolen(ST(0)) turns
to dozens of function calls in machine code on Perl 5.12, in 5.14, it
still might be bloatful (not sure didn't check), since alias rules of C
dictate the my_perl struct has to be reread between somethings. I dont
remember if it is ";"/sequence points, or between function calls, or
every all read/writes to the my_perl have to be done since VC doesn't
know if my_perl is being changed by another thread or another function
call and you didn't use the optimize pragma to say no aliasing.
-remove double copies, mallocs/Newx, and strlens whenever possible

I know adding delay loading dlls added some bloats due to the delay
handler func. The mingw delay loading code is the first in the world and
some would call it experimental. I had a talk on IRC with the author of
the delay loading lib generator in dlltool.exe from binutils/reactos, he
said he has never seen anyone do it (make dyn load lib files for gcc) in
a makefile/automated and he has never seen gcc delay load used outside
of reactos. GCC has no idea what a delay load dll is BTW. Only dlltool
does. I do alot of sanity checking in the makefile.pl for mingw's delay
loading support and it should disable if anything is wrong. I also check
the /? output of CL on VC since there is a comment in perl interp's
win32 makefile that some old free Platform SDKs had the delay loading
crippled. I didn't try compiling on those Platform SDKs but I guess the
delay flag won't be in /? if its not supported right?

When I tested my code in march, no 64bit Strawberry Perl could delay
load DLL due to a bug in dlltool for 64 bit code that has been since
fixed, but the latest S Perls in March 2012 had the buggy for 64bit
Delay loading dlltool. 

Regarding maintenance, I tossed in tons of comments.  I can add more if
you tell me where. Also the string table system (background given later)
was designed to prevent bugs. It keeps the strings exactly the same in
the .xs file as on the machine code level so you can search the source
code for them exactly as you see them everywhere else, no conversion by
hand to struct member names. The strings tables are structs. The member
names are exactly the strings they contain. The strings are converted to
C identifiers during the build process by Perl scripts. If you change a
string in the pxmacro file, and don't update it in the .xs, it wont
compile. If you change a string in the .xs and don't update the
.pxmacro, it won't compile. None of the string table offsets are
calculated by hand, they are all calculated by the compiler and
optimized to constants. The only flaw with the system I see is if the
offsets are more than a unsigned char long (256). I put in a bunch of
static asserts to prevent this. Anyone who adds more string tables in
the future has to also add static asserts. For the Wide string tables,
the offsets are in WCHARs, not bytes to keep it under 256. 16 bit
immediate movs are impossible in 32 bit x86 asm, so shorts for the
string table offsets are pointless and short offsets would offer no
savings in machine code over absolute/relative pointers to the strings
(normal way).

Another bloat reducing technique is in areas with intensive use of const
strings in Win32 mod, VS aligns/null pads strings to 16 or 8 bytes (I
couldn't figure out which, and I couldn't find any comments online about
this) on 32 bits. The offset pointers are 4 bytes in machine code. By
stripping all the NULLs to the minimum required to make a string table
fit in under 256, and the use of branch/switch trees of loading 1 byte
offsets into a register, then recombining it with the pointer to the
head string table/struct, alot of machine code AND RO static space can
be saved. I have never seen VC 2003 make a jump table unlike Mingw. I
have a gut feeling that MS's VC discriminates heavily against C code and
the C side of CL hasn't gotten any upgrades since the late 1980s. This
string table technique is used in most notably in the ALIAS funcs for
the usage croaks, and GetFolderPath(). The string structs are generated
using Perl scripts during the makefile build process. The .pxmacro files
are converted to .xmacro files which are #included into the .xs file
under the xmacro technique. The conversion from pxmacro which are just
newline separated strings to xmacro file makes it much easier to add new
string without having conform to the xmacro macro define syntax.

> +     - removed the CRT
> 
> I don't understand the point of this. Perl itself already loads in a
> CRT, so why wouldn't we use it? Having code that is part of the CRT
> repeated in the module seems like bloat to me. :)
> 
The CRT adds a KB of static linked machine code and ~ 100 bytes of r/w
statics on my VS 2003, newer VSes even much more. Its one more DLL that
the DLL loader has to process. Each VS has its own CRT, Platform SDK has
its own ("generic windows/NT" msvcrt.dll). A perl process can wind up
with multiple CRT DLLs loaded in its process. Every Win32 Perl already
always has 2 CRTs pointlessly loaded, the C compiler one, and one
LoadLibraryed for the perlhost C++ class for malloc and free. Someone
should fix that. Its Windows, the CRT isn't needed for most things.
Either perl's API provides it, or kernel32 or shlwapi (always loaded
since 5.12 because of comctrl32) or user32 does. I didn't use shlwapi on
purpose because some NT4 or Dos Windows or Perl 5.10 people might not
like it and because Win32 mod wasn't using it before (0.44 and older),
its one extra dll and you might as well use CRT then. GCC intrinsic are
terribly broken and unreliable, and behave differently between Mingw.org
and Mingw64. See https://rt.cpan.org/Public/Bug/Display.html?id=75683
for more.  VC intrinsics fill in for a couple tiny CRT funcs to get rid
of linking to the CRT. For other cases, I wrote a mini "CRT find and
loadlibrary a C std lib", not necessarily a CRT. Read the code in BOOT:.
The VC instrinics are probed and either the intrinsic or non-intrinsic
equivalent is selected  and copied and renamed to the root build folder
and is then included into the .xs file.

> 
> +     - compiles with PERL_CORE now
> 
> Compiling modules with PERL_CORE is just wrong. Only code that goes into
> perl5xx.dll should ever be compiled with PERL_CORE. I understand that
> some modules would define PERL_CORE anyways because using it gave you a
> speed advantage. This speed advantage no longer exists in 5.14 and
> later. So I don't want to support the misuse of PERL_CORE in Win32,
> given that it would only benefit unsupported legacy releases of Perl.

The default can be changed in the makefile.pl and PERL_CORE turned off.
That was intentional. I think the only thing I had to do get PERL_CORE
to work was replace all the vararg perl api with Perl_ instead of the
nocontext aTHX less versions. Win32 was already using Perl_ funcs in
some places but all the code added over the last 10 years was
un-prefixed.  Maybe a perl version check for 5.12 and older, then
PERL_CORE is on by default. Alot of the tricks I made I put in
makefile.pl as disableable on purpose. I am open to changing the
defaults in the makefile.pl.

> I'm also happy to apply your improvements to the regression tests. Let
> me know if you want to provide them in a separate patch, or if I should
> just extract them from current one!
> 
> Cheers,
> -Jan
> 
> 


> +     - mentioned 32KB Unicode path support is untested
> 
> It can be mentioned, but there are a lot of issues with Unicode filename
> support in Perl for Windows.
> 
> The general status of Unicode support in the Win32::* modules is quite
> a mess. Maybe we should discuss this in a separate thread on the
> mailing list.
> 
I just mentioned 32 KB unicode path issue in the POD since some people
might seriously want that feature (over MAX_PATH paths on NTFS). I dont
need need/try/test it. I didn't write any additional tests for unicode
support, I saw Win32 mod had a couple already. I didn't change them. The
tests still pass of course. The ALIAS subs I wrote croak error message
tests for.

I noticed you went down the changelog I wrote, I suggest you quote and
discuss parts of the patch file instead. The changelog is not patch file.

There is a part of makefile.pl you can toss if you think it looks
bad/unprofessional, the GetDebugExports sub in makefile.pl. I used it
for the strawberry perl test builds to look at them with a disassembler
since I choose not to use GCC's debugging symbols.

Reply via email to