Aha. OK, then my suspicion is that your custom build isn't enabling the SIMD code at all for some reason. Pass -DREQUIRE_SIMD=1 to cmake when configuring the build, and that will cause the configuration to fail if it can't enable SIMD instructions. It could be something as simple as NASM not being installed and in your PATH.
2.0.6 was compiled with VS 2010. I started using VS 2015 with libjpeg-turbo 2.1, so the 2.1 beta1 release should avoid that linkage issue. On 2/16/21 3:43 PM, David Horman wrote: > > I just discovered the pre-built Windows binaries - not sure how I > overlooked these before - so I got 2.0.6 and it gives me comparable > results - faster, in fact, perhaps because I/O is more direct - to > Ubuntu. I must have messed something up when compiling 2.0.4, although > I don't know what that might have been. I think I may have followed > this guide: > > https://github.com/libjpeg-turbo/libjpeg-turbo/blob/master/BUILDING.md > > to generate the project files (it looks like I did so twice, in > separate directories, for x86 and x64), so maybe there was some flag I > should have set that I didn't. I couldn't see anything amiss in the > Project Properties though, and I made sure I was doing Release builds. > > CPU detection certainly sounds like a plausible culprit - I'm still > curious to find out what the problem was so I may give it a try > another day, and if I do I'll report back. > > Sorry if I've wasted your time, but thanks very much for your help! > > (PS To link to the pre-built 2.0.6 library I had to link in > legacy_stdio_definitions.lib and add my own implentation of > __iob_func. I think this is because it was compiled with VS 2015 and > I'm using VS 2019, which has inlined and redefined some stdio stuff) > > David > > On 16/02/2021 21:25, DRC wrote: >> >> OK, so it's a legitimate slow-down, but unfortunately, I have no clue >> what could be causing it. When I run Windows vs. Linux on the same >> hardware, I observe more like a 5% slow-down under Windows. >> >> Are you trying to benchmark the Windows code while the Linux VM is >> running? That might be the cause. Maybe Hyper-V is giving a higher >> priority to the Linux guest than to user code running in the Windows >> host. >> >> I'm also wondering if maybe CPU feature detection is borked somehow. >> If you're comfortable building libjpeg-turbo from source, try adding >> a print statement at the end of init_simd() in simd/x86_64/jsimd.c >> and see if you get the same values for simd_support and simd_huffman >> on both O/S platforms. >> >> On 2/16/21 3:09 PM, David Horman wrote: >>> >>> _*Ubuntu (WSL):*_ >>> >>> >>>>> RGB (Top-down) <--> JPEG 4:2:0 Q80 <<<<< >>> >>> Image size: 15335 x 7991 >>> Compress --> Frame rate: 2.735202 fps >>> Output image size: 11710674 bytes >>> Compression ratio: 31.392382:1 >>> Throughput: 335.177027 Megapixels/sec >>> Output bit stream: 256.248429 Megabits/sec >>> Decompress --> Frame rate: 3.473690 fps >>> Throughput: 425.672917 Megapixels/sec >>> >>> _*Windows 10 (x64):*_ >>> >>> >>>>> RGB (Top-down) <--> JPEG 4:2:0 Q80 <<<<< >>> >>> Image size: 15335 x 7991 >>> Compress --> Frame rate: 0.740540 fps >>> Output image size: 11710674 bytes >>> Compression ratio: 31.392382:1 >>> Throughput: 90.747236 Megapixels/sec >>> Output bit stream: 69.377776 Megabits/sec >>> Decompress --> Frame rate: 1.230370 fps >>> Throughput: 150.772003 Megapixels/sec >>> >>> ------------------------------------------------------------------ >>> >>> And for good measure, a medium-sized image: >>> >>> ------------------------------------------------------------------ >>> >>> _*Ubuntu (WSL):*_ >>> >>> >>>>> RGB (Top-down) <--> JPEG 4:2:0 Q80 <<<<< >>> >>> Image size: 1024 x 1024 >>> Compress --> Frame rate: 351.023714 fps >>> Output image size: 72448 bytes >>> Compression ratio: 43.420495:1 >>> Throughput: 368.075042 Megapixels/sec >>> Output bit stream: 203.447728 Megabits/sec >>> Decompress --> Frame rate: 430.199562 fps >>> Throughput: 451.096936 Megapixels/sec >>> >>> _*Windows 10 (x64):*_ >>> >>> >>>>> RGB (Top-down) <--> JPEG 4:2:0 Q80 <<<<< >>> >>> Image size: 1024 x 1024 >>> Compress --> Frame rate: 101.308824 fps >>> Output image size: 72448 bytes >>> Compression ratio: 43.420495:1 >>> Throughput: 106.230002 Megapixels/sec >>> Output bit stream: 58.716973 Megabits/sec >>> Decompress --> Frame rate: 150.991629 fps >>> Throughput: 158.326198 Megapixels/sec >>> >>> --------------------------------------------------------------------------- >>> >>> On 16/02/2021 21:00, DRC wrote: >>>> >>>> Please test an image that is closer to the actual size you intend >>>> to compress in your application. The performance of the 227x149 >>>> test image in libjpeg-turbo is going to depend too heavily on >>>> overhead to be a good comparison. You want a much larger image so >>>> you can really test the maximum throughput. >>>> >>>> On 2/16/21 2:54 PM, David Horman wrote: >>>>> >>>>> Thanks for the suggestion. Here are my results (apt helpfully >>>>> suggested that I install libjpeg-turbo-test): >>>>> >>>>> *_Ubuntu (Windows Subsystem on Linux):_* >>>>> >>>>> >>>>> RGB (Top-down) <--> JPEG 4:2:0 Q80 <<<<< >>>>> >>>>> Image size: 227 x 149 >>>>> Compress --> Frame rate: 7421.929573 fps >>>>> Output image size: 6068 bytes >>>>> Compression ratio: 16.721984:1 >>>>> Throughput: 251.031924 Megapixels/sec >>>>> Output bit stream: 360.290149 Megabits/sec >>>>> Decompress --> Frame rate: 9198.674991 fps >>>>> Throughput: 311.126784 Megapixels/sec >>>>> >>>>> *_Windows 10 (same computer), x64:_* >>>>> >>>>> >>>>> RGB (Top-down) <--> JPEG 4:2:0 Q80 <<<<< >>>>> >>>>> Image size: 227 x 149 >>>>> Compress --> Frame rate: 2274.411861 fps >>>>> Output image size: 6068 bytes >>>>> Compression ratio: 16.721984:1 >>>>> Throughput: 76.927432 Megapixels/sec >>>>> Output bit stream: 110.409049 Megabits/sec >>>>> Decompress --> Frame rate: 3659.631437 fps >>>>> Throughput: 123.779714 Megapixels/sec >>>>> >>>>> As you can see, still quite a big difference! x86 tjbench.exe was >>>>> even slower at 1660fps. >>>>> >>>>> I probably should have mentioned before, I'm using version 2.0.4. >>>>> >>>>> As for my code, it prepares and writes 64 rows at a time using >>>>> jpeg_write_scanlines. All the image data is already in RAM, I just >>>>> prepare it in strips because that's what libtiff expects you to do >>>>> (it outputs TIFF, PNG, or JPEG using the same code, varying only >>>>> in the call to the appropriate library once each strip is >>>>> complete. and as noted before PNG speed is the same on both Ubuntu >>>>> and Windows). I also tried 1, 16, 512, and the full 7444 rows at a >>>>> time, but it didn't make any difference. >>>>> >>>>> David >>>>> >>>>> On 16/02/2021 20:33, DRC wrote: >>>>>> The quickest way to know whether libjpeg-turbo is at fault for the >>>>>> performance difference is to run tjbench with the same input image and >>>>>> settings on both machines. For instance: >>>>>> >>>>>> /opt/libjpeg-turbo/bin/tjbench image.ppm 80 -rgb -subsamp 420 -nowrite >>>>>> or >>>>>> c:\libjpeg-turbo64\bin\tjbench image.ppm 80 -rgb -subsamp 420 -nowrite >>>>>> >>>>>> will test the raw compute performance of compressing the contents of >>>>>> image.ppm from an RGB pixel buffer into a JPEG image with quality 80 and >>>>>> 4:2:0 subsampling. >>>>>> >>>>>> That will also give you an idea of the performance ceiling, excluding >>>>>> I/O time. I suspect that the difference you're observing is due to I/O >>>>>> time, which is out of libjpeg-turbo's control (Windows I/O is just >>>>>> slower than Linux I/O.) However, here are some possible areas for >>>>>> optimization: >>>>>> >>>>>> -- If you can spare the memory, the most efficient way to compress a >>>>>> JPEG image is to load the entire source image into memory and use the >>>>>> in-memory destination manager. (That's what tjbench does.) However, >>>>>> it's understandable if this is an untenable proposition for a >>>>>> 110-megapixel image. >>>>>> >>>>>> -- If you have to use buffered I/O, then try increasing the size of your >>>>>> buffer. >>>>>> >>>>>> -- Check for any costly and unnecessary Extended-RGB-to-RGB color >>>>>> conversion algorithms that could be replaced with the use of the >>>>>> libjpeg-turbo colorspace extensions. I've seen older code, which was >>>>>> written for libjpeg, perform really inefficient per-pixel RGBA-to-RGB or >>>>>> BGRA-to-RGB conversion, and these algorithms are so slow that they >>>>>> effectively hide any speedup from libjpeg-turbo. >>>>>> >>>>>> I'm happy to review your JPEG compression kernel if you'll post a >>>>>> snippet of code. >>>>>> >>>>>> On 2/16/21 1:11 PM, David H wrote: >>>>>>> Hi all, >>>>>>> >>>>>>> I'm writing some software which uses libjpeg-turbo to write its output >>>>>>> file. I managed to build the turbojpeg-static project with Visual Studio >>>>>>> C++ to create the turbojpeg-static.lib file and linked it to my program, >>>>>>> also built with Visual Studio C++. So far so good. >>>>>>> >>>>>>> In testing, writing a 14849 x 7444 JPEG takes 1.47 seconds. >>>>>>> >>>>>>> However, when I compile the same program in my WSL Ubuntu environment >>>>>>> running on the same laptop, linking to libjpeg (apt-get install >>>>>>> libjpeg-dev), writing the JPEG only takes 0.72 seconds. The other parts >>>>>>> of the program all vary slightly in speed, as you'd expect with >>>>>>> different compilers, but none show such a huge disparity as JPEG output. >>>>>>> PNG output is the same speed from both builds. >>>>>>> >>>>>>> This seems like a pretty big difference to me, but I'm not sure where to >>>>>>> start figuring it out. I'm pretty sure I have all the good optimisations >>>>>>> turned on in the VC project, and I've tried it with /fp:fast, but it >>>>>>> doesn't seem to make a difference. >>>>>>> >>>>>>> Are there any known speed issues with libjpeg-turbo on WIndows that >>>>>>> would explain this, or can anyone suggest some things for me to check? >>>>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "libjpeg-turbo User Discussion/Support" group. >>> To unsubscribe from this group and stop receiving emails from it, >>> send an email to libjpeg-turbo-users+unsubscr...@googlegroups.com >>> <mailto:libjpeg-turbo-users+unsubscr...@googlegroups.com>. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/libjpeg-turbo-users/a5faea8c-67cc-097e-32c0-5dec3b1b9528%40gmail.com >>> <https://groups.google.com/d/msgid/libjpeg-turbo-users/a5faea8c-67cc-097e-32c0-5dec3b1b9528%40gmail.com?utm_medium=email&utm_source=footer>. >> -- >> You received this message because you are subscribed to a topic in >> the Google Groups "libjpeg-turbo User Discussion/Support" group. >> To unsubscribe from this topic, visit >> https://groups.google.com/d/topic/libjpeg-turbo-users/IwvQhDFfjXE/unsubscribe. >> To unsubscribe from this group and all its topics, send an email to >> libjpeg-turbo-users+unsubscr...@googlegroups.com >> <mailto:libjpeg-turbo-users+unsubscr...@googlegroups.com>. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/libjpeg-turbo-users/1a55d360-ad8a-b051-8323-edcf6a39be36%40virtualgl.org >> <https://groups.google.com/d/msgid/libjpeg-turbo-users/1a55d360-ad8a-b051-8323-edcf6a39be36%40virtualgl.org?utm_medium=email&utm_source=footer>. > -- > You received this message because you are subscribed to the Google > Groups "libjpeg-turbo User Discussion/Support" group. > To unsubscribe from this group and stop receiving emails from it, send > an email to libjpeg-turbo-users+unsubscr...@googlegroups.com > <mailto:libjpeg-turbo-users+unsubscr...@googlegroups.com>. > To view this discussion on the web visit > https://groups.google.com/d/msgid/libjpeg-turbo-users/4cfbada8-4e96-27ac-e93e-5756cd83aa5b%40gmail.com > <https://groups.google.com/d/msgid/libjpeg-turbo-users/4cfbada8-4e96-27ac-e93e-5756cd83aa5b%40gmail.com?utm_medium=email&utm_source=footer>. -- You received this message because you are subscribed to the Google Groups "libjpeg-turbo User Discussion/Support" group. To unsubscribe from this group and stop receiving emails from it, send an email to libjpeg-turbo-users+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/libjpeg-turbo-users/69d3eac5-8e94-6419-f26d-fdee84595ec8%40virtualgl.org.