Hi Sebastian, This is what I get when I compile with just “python3.8 setup.py build -j 16”:
--------- BEGIN --------- ########### CLIB COMPILER OPTIMIZATION ########### INFO: Platform : Architecture: x64 Compiler : gcc CPU baseline : Requested : 'min' Enabled : SSE SSE2 SSE3 Flags : -msse -msse2 -msse3 Extra checks: none CPU dispatch : Requested : 'max -xop -fma4' Enabled : SSSE3 SSE41 POPCNT SSE42 AVX F16C FMA3 AVX2 AVX512F AVX512CD AVX512_KNL AVX512_KNM AVX512_SKX AVX512_CLX AVX512_CNL AVX512_ICL Generated : none --------- END --------- I was wondering why it says “Generated : none” for the CPU dispatch? This is the output of “np.show_runtime()”: --------- BEGIN --------- [{'numpy_version': '0+untagged.31149.g6d474f2', 'python': '3.8.10 (default, Nov 14 2022, 12:59:47) \n[GCC 9.4.0]', 'uname': uname_result(system='Linux', node='lambda', release='5.4.0-135-generic', version='#152-Ubuntu SMP Wed Nov 23 20:19:22 UTC 2022', machine='x86_64', processor='x86_64')}, {'simd_extensions': {'baseline': ['SSE', 'SSE2', 'SSE3'], 'found': ['SSSE3', 'SSE41', 'POPCNT', 'SSE42', 'AVX', 'F16C', 'FMA3', 'AVX2', 'AVX512F', 'AVX512CD', 'AVX512_SKX'], 'not_found': ['AVX512_KNL', 'AVX512_KNM', 'AVX512_CLX', 'AVX512_CNL', 'AVX512_ICL']}}] --------- END --------- Apparently, the CPU supports AVX512_SKX, i.e., avx512bw, avx512dq, avx512vl (bold face is mine, rest is from /cpu/procinfo): --------- BEGIN --------- model name : Intel(R) Core(TM) i9-9820X CPU @ 3.30GHz flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 invpcid_single pti ssbd mba ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb intel_pt avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts hwp hwp_act_window hwp_epp hwp_pkg_req md_clear flush_l1d arch_capabilities --------- END --------- Am I building numpy incorrectly? I really need to be able to execute the AVX512 quicksort implementation as part of a research project, where we generate efficient sorting implementations that we would like to contribute to numpy to the degree that they improve upon existing solutions. Any help is highly appreciated! Cheers, Peter From: Sebastian Berg <sebast...@sipsolutions.net> Date: Friday, 6 January 2023 at 08.04 To: numpy-discussion@python.org <numpy-discussion@python.org> Subject: [Numpy-discussion] Re: ndarray.sort x86 dispatch [You don't often get email from sebast...@sipsolutions.net. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ] On Wed, 2023-01-04 at 04:06 +0000, Peter Schneider-Kamp wrote: > Hi guys, > > I am trying to understand how the x86 dispatch for ndarray sort > works. The following call in Line 137 of > numpy/core/src/npysort/quicksort.cpp returns 0 for my test cases: > > if (x86_dispatch<Tag>::quicksort(start, num)) > return 0; > > I have tried to compile with --cpu-dispatch="AVX512_KNL AVX512_CLX > AVX512_CNL AVX512_ICL AVX512_SKX" but for dtype=uint64 (or int64 or > uint8 or float32 or float64) it always the same result, i.e., the > standard quicksort is used instead of the AVX512 one with bitonic > sorting base cases. > > What do I have to do to be able to use the AVX512 implementation? You can check what is found with: np.show_runtime() Also just google your CPU or check `cat /proc/cpuinfo`. AVX512 exists currently only on high end Intel CPUs (IIRC), presumably, you simply do not have the hardware. E.g. on M1, `show_runtime()` won't even list AVX512 when compiled, on x86 they would probably show up as "not found" because they exist, but your hardware doesn't support AVX512. - Sebastian > > I am currently compiling on a MacBook Pro with Monterey. I have all > kinds of Linux machines available, if that should be a requirements. > > Thanks in advance for any insights! > > Cheers, > Peter > -- > Peter Schneider-Kamp > Professor in Artificial Intelligence > Department of Mathematics & Computer Science > University of Southern Denmark > > _______________________________________________ > NumPy-Discussion mailing list -- numpy-discussion@python.org > To unsubscribe send an email to numpy-discussion-le...@python.org > https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmail.python.org%2Fmailman3%2Flists%2Fnumpy-discussion.python.org%2F&data=05%7C01%7Cpetersk%40imada.sdu.dk%7Caf6f69d647d84d2dd31408daef607308%7C9a97c27db83e4694b35354bdbf18ab5b%7C0%7C0%7C638085494773591366%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=pP%2FsKXyxmclgIRkAxVzjIMqxJ590cGO1SA5kSow1JYI%3D&reserved=0 > Member address: sebast...@sipsolutions.net _______________________________________________ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-le...@python.org https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmail.python.org%2Fmailman3%2Flists%2Fnumpy-discussion.python.org%2F&data=05%7C01%7Cpetersk%40imada.sdu.dk%7Caf6f69d647d84d2dd31408daef607308%7C9a97c27db83e4694b35354bdbf18ab5b%7C0%7C0%7C638085494773591366%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=pP%2FsKXyxmclgIRkAxVzjIMqxJ590cGO1SA5kSow1JYI%3D&reserved=0 Member address: pete...@imada.sdu.dk
_______________________________________________ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-le...@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: arch...@mail-archive.com