Mon May 28 04:01:37 2012: Request 76368 was acted upon. Transaction: Correspondence added by bulk88. Queue: Win32 Subject: rewrote Win32 Broken in: 0.44 Severity: Wishlist Owner: Nobody Requestors: bul...@hotmail.com Status: open Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=76368 >
On Thu May 24 18:12:51 2012, j...@activestate.com wrote: > On Sat, 07 Apr 2012, patcat via RT wrote: > > Sorry for the long delay; I've been traveling for a month and am just > catching up with some things. > > > Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=76368 > > > > > The Win32 library's XS code looked very bloated to me. So I rewrote it. > > Thank you for your effort! I would like to split up your big patch into > several smaller ones, each with just a single purpose. Ideally you > would create a serious of commits on github and just issue a pull > request against: > > https://github.com/jandubois/win32 Egh, I dont really know how to use revision control software. Your lucky I figured out how to use the diff tool and didn't attach a zip to this RT jkjk. *I* would rather not try to split up the patch I posted into separate files. Address it as a whole. The patch file as a whole can be changed. Its a rewrite, or a fork really. It would be alot of work for me to split it up and a couple of the bloat reducing techniques are intertwined in a way if they were split into different patches, the intermediate versions wouldn't compile. Some of the code I put into this rewrite was specifically for others to reuse in their XS modules. The bug fixes are also in the ALIASed xsubs so they would need to be backported to the non-ALIASed versions. If you didn't respond, I would've eventually posted my rewrite on CPAN under Win32::Slim/Fast/Small or something similar. My reasoning behind the rewrite was, this module is loaded on practically every Win32 perl process in the world, its a worthy target to make the best/smallest possible. I didn't touch the Perl side of it except to rewrite the POD for time reasons. Someone could argue some of those constants and EXPORT_OK should've been created in the C code. I also did NOT test this on cygwin perl, and I didn't try to build a perl interp from scratch with this new Win32 mod since I think Win32 mod is what is called a dual life. I dont think it will fail during a perl interp compile since no modules other are than EUMM are used during the build. > > However, for some parts of your patch I need some convincing that they > are correct/necessary: > > +0.45 [2012-??-??] > + - massive refactoring, to reduce code bloat, DLL is 24KB vs 33KB in > + 0.44, using 32bit Perl 5.10 on VC 2003, with default -01 > > I'm not concerned about "bloat" when the total size is just 33KB. > Reducing the size is fine, but should not be done at the cost of making > the code harder to maintain. Which part of your patch is responsible for > the size reduction? 90% of it. Size reduction techniques used. -CRT removal -PERL_CORE (read more later in the post) -Heavy use of ALIAS: keyword (or the Perl C equivalent, Win32 mod doesn't really use XS) -switches statements and grouping Win32 subs into single C func based on common prototypes in Perl and Perl to C parameter conversion -Heavy use of gotos, -O1 VC 2003 almost never combines identical branch blocks with jumps -Register optimization/liveness reduction/reducing stack variables through reorganization of calls to the Perl API -No XS_RETURN macros, ever, they cause extremely duplicated machine code. VS 2003 atleast in C mode doesn't realize its the same block of code and makes a jump code. non-X PUSH and PUTBACK are the slim-est way to return parameters in XS. -I tried a kindda crazy way (not MS and probably not GCC documented) of getting 2 duplicate DLL names out of the DLL by getting pointers to the import table strings of them. Preprocessor it out if your not comfortable with it. -saving SV *s in C autos, macro expansion like SvPV_nolen(ST(0)) turns to dozens of function calls in machine code on Perl 5.12, in 5.14, it still might be bloatful (not sure didn't check), since alias rules of C dictate the my_perl struct has to be reread between somethings. I dont remember if it is ";"/sequence points, or between function calls, or every all read/writes to the my_perl have to be done since VC doesn't know if my_perl is being changed by another thread or another function call and you didn't use the optimize pragma to say no aliasing. -remove double copies, mallocs/Newx, and strlens whenever possible I know adding delay loading dlls added some bloats due to the delay handler func. The mingw delay loading code is the first in the world and some would call it experimental. I had a talk on IRC with the author of the delay loading lib generator in dlltool.exe from binutils/reactos, he said he has never seen anyone do it (make dyn load lib files for gcc) in a makefile/automated and he has never seen gcc delay load used outside of reactos. GCC has no idea what a delay load dll is BTW. Only dlltool does. I do alot of sanity checking in the makefile.pl for mingw's delay loading support and it should disable if anything is wrong. I also check the /? output of CL on VC since there is a comment in perl interp's win32 makefile that some old free Platform SDKs had the delay loading crippled. I didn't try compiling on those Platform SDKs but I guess the delay flag won't be in /? if its not supported right? When I tested my code in march, no 64bit Strawberry Perl could delay load DLL due to a bug in dlltool for 64 bit code that has been since fixed, but the latest S Perls in March 2012 had the buggy for 64bit Delay loading dlltool. Regarding maintenance, I tossed in tons of comments. I can add more if you tell me where. Also the string table system (background given later) was designed to prevent bugs. It keeps the strings exactly the same in the .xs file as on the machine code level so you can search the source code for them exactly as you see them everywhere else, no conversion by hand to struct member names. The strings tables are structs. The member names are exactly the strings they contain. The strings are converted to C identifiers during the build process by Perl scripts. If you change a string in the pxmacro file, and don't update it in the .xs, it wont compile. If you change a string in the .xs and don't update the .pxmacro, it won't compile. None of the string table offsets are calculated by hand, they are all calculated by the compiler and optimized to constants. The only flaw with the system I see is if the offsets are more than a unsigned char long (256). I put in a bunch of static asserts to prevent this. Anyone who adds more string tables in the future has to also add static asserts. For the Wide string tables, the offsets are in WCHARs, not bytes to keep it under 256. 16 bit immediate movs are impossible in 32 bit x86 asm, so shorts for the string table offsets are pointless and short offsets would offer no savings in machine code over absolute/relative pointers to the strings (normal way). Another bloat reducing technique is in areas with intensive use of const strings in Win32 mod, VS aligns/null pads strings to 16 or 8 bytes (I couldn't figure out which, and I couldn't find any comments online about this) on 32 bits. The offset pointers are 4 bytes in machine code. By stripping all the NULLs to the minimum required to make a string table fit in under 256, and the use of branch/switch trees of loading 1 byte offsets into a register, then recombining it with the pointer to the head string table/struct, alot of machine code AND RO static space can be saved. I have never seen VC 2003 make a jump table unlike Mingw. I have a gut feeling that MS's VC discriminates heavily against C code and the C side of CL hasn't gotten any upgrades since the late 1980s. This string table technique is used in most notably in the ALIAS funcs for the usage croaks, and GetFolderPath(). The string structs are generated using Perl scripts during the makefile build process. The .pxmacro files are converted to .xmacro files which are #included into the .xs file under the xmacro technique. The conversion from pxmacro which are just newline separated strings to xmacro file makes it much easier to add new string without having conform to the xmacro macro define syntax. > + - removed the CRT > > I don't understand the point of this. Perl itself already loads in a > CRT, so why wouldn't we use it? Having code that is part of the CRT > repeated in the module seems like bloat to me. :) > The CRT adds a KB of static linked machine code and ~ 100 bytes of r/w statics on my VS 2003, newer VSes even much more. Its one more DLL that the DLL loader has to process. Each VS has its own CRT, Platform SDK has its own ("generic windows/NT" msvcrt.dll). A perl process can wind up with multiple CRT DLLs loaded in its process. Every Win32 Perl already always has 2 CRTs pointlessly loaded, the C compiler one, and one LoadLibraryed for the perlhost C++ class for malloc and free. Someone should fix that. Its Windows, the CRT isn't needed for most things. Either perl's API provides it, or kernel32 or shlwapi (always loaded since 5.12 because of comctrl32) or user32 does. I didn't use shlwapi on purpose because some NT4 or Dos Windows or Perl 5.10 people might not like it and because Win32 mod wasn't using it before (0.44 and older), its one extra dll and you might as well use CRT then. GCC intrinsic are terribly broken and unreliable, and behave differently between Mingw.org and Mingw64. See https://rt.cpan.org/Public/Bug/Display.html?id=75683 for more. VC intrinsics fill in for a couple tiny CRT funcs to get rid of linking to the CRT. For other cases, I wrote a mini "CRT find and loadlibrary a C std lib", not necessarily a CRT. Read the code in BOOT:. The VC instrinics are probed and either the intrinsic or non-intrinsic equivalent is selected and copied and renamed to the root build folder and is then included into the .xs file. > > + - compiles with PERL_CORE now > > Compiling modules with PERL_CORE is just wrong. Only code that goes into > perl5xx.dll should ever be compiled with PERL_CORE. I understand that > some modules would define PERL_CORE anyways because using it gave you a > speed advantage. This speed advantage no longer exists in 5.14 and > later. So I don't want to support the misuse of PERL_CORE in Win32, > given that it would only benefit unsupported legacy releases of Perl. The default can be changed in the makefile.pl and PERL_CORE turned off. That was intentional. I think the only thing I had to do get PERL_CORE to work was replace all the vararg perl api with Perl_ instead of the nocontext aTHX less versions. Win32 was already using Perl_ funcs in some places but all the code added over the last 10 years was un-prefixed. Maybe a perl version check for 5.12 and older, then PERL_CORE is on by default. Alot of the tricks I made I put in makefile.pl as disableable on purpose. I am open to changing the defaults in the makefile.pl. > I'm also happy to apply your improvements to the regression tests. Let > me know if you want to provide them in a separate patch, or if I should > just extract them from current one! > > Cheers, > -Jan > > > + - mentioned 32KB Unicode path support is untested > > It can be mentioned, but there are a lot of issues with Unicode filename > support in Perl for Windows. > > The general status of Unicode support in the Win32::* modules is quite > a mess. Maybe we should discuss this in a separate thread on the > mailing list. > I just mentioned 32 KB unicode path issue in the POD since some people might seriously want that feature (over MAX_PATH paths on NTFS). I dont need need/try/test it. I didn't write any additional tests for unicode support, I saw Win32 mod had a couple already. I didn't change them. The tests still pass of course. The ALIAS subs I wrote croak error message tests for. I noticed you went down the changelog I wrote, I suggest you quote and discuss parts of the patch file instead. The changelog is not patch file. There is a part of makefile.pl you can toss if you think it looks bad/unprofessional, the GetDebugExports sub in makefile.pl. I used it for the strawberry perl test builds to look at them with a disassembler since I choose not to use GCC's debugging symbols.