Re: Intel AVX Support [PSARC/2010/311 FastTrack timeout 08/11/2010]
This case has timed out and has a +1. I'm closing as approved. Thanks, Sherry On Wed, Aug 04, 2010 at 11:16:18AM -0700, Sherry Moore wrote: I am sponsoring the following fast-track for Lejun Zhu and Kuriakose Kuruvilla. It requests a patch/micro binding. Man pages with change bars are available in the materials directory. Timeout is set to 8/11/2010. Thanks, Sherry Template Version: @(#)sac_nextcase 1.70 03/30/10 SMI This information is Copyright (c) 2010, Oracle and/or its affiliates. All rights reserved. 1. Introduction 1.1. Project/Component Working Name: Support Intel Advanced Vector Extensions (AVX) in Solaris 1.2. Name of Document Author/Supplier: Lejun Zhu Kuriakose Kuruvilla 1.3. Date of This Document: Jul 14th, 2010 1.4. Name of Major Document Customer(s)/Consumer(s): 1.4.1. The Community you expect to review your project: 1.4.2. The ARC(s) you expect to review your project: // Leave blank if you don't have any preference // This item is advisory only 1.5. Email Aliases: 1.5.2. Responsible Engineer: lejun@intel.com kuriakose.kuruvi...@oracle.com 1.5.4. Interest List: intel-c...@sun.com 2. Project Summary 2.1. Project Description: Intel Advanced Vector Extensions (AVX) introduces new instructions that accelerate vector floating point operations. AVX uses 256-bit registers, which requires extension of current Solaris interfaces that manipulate FPU registers, such as signal stack layout, setcontext syscall and /proc interface. 2.2. Risks and Assumptions: When extending Solaris interfaces and/or data structures to support AVX, it is very important to provide binary compatibility for existing applications. All application binaries that exist today will continue to run on new Solaris kernel without having to be recompiled. The only restriction for existing binaries is that they have enough space on the signal stack to hold the extra state (see 4.1.2 for details). 3. Business Summary 3.1. Problem Area: Intel AVX is a new 256-bit SIMD FP vector extension of Intel Architecture. Its introduction is targeted for the next Intel Microarchitecture (code named: Sandy Bridge). Intel AVX accelerates the trends towards FP intensive computation in general purpose applications like image, video, and audio processing, engineering applications such as 3D modeling and analysis, scientific simulation, and financial analytics. 3.2. Market/Requester: 3.3. Business Justification: Customers who use Solaris x86 will expect to run optimized applications on Sandy Bridge and future generations of Intel CPU, and many optimizations will use AVX instructions, such as Basic Linear Algebra Subprograms (BLAS) with DGEMM Routine, or sequential and cluster FFTs. Also, the amd64 ABI has already supported YMM registers. Latest GCC can generate AVX instructions, and an AVX-enabled Sun Studio compiler is being developed. All of these will require kernel changes to support AVX. 3.4. Competitive Analysis: Support for XSAVE and YMM has already been implemented in Linux kernel. 3.5. Opportunity Window/Exposure: Intel will support AVX instructions in the next generation Intel Microarchitecture (code-named: Sandy Bridge). Applications optimized for Sandy Bridge will emerge soon. In order to enable these optimizations on Solaris, we need to get the OS support into ON as soon as possible. 3.6. How will you know when you are done?: Applications can run correctly and use YMM registers on Intel machines that support AVX/YMM registers. 4. Technical Description: 4.1. Details: 4.1.1 Extending ucontext_t Structure ucontext_t will have the same size as its previous version and all existing fields will be at the same byte offset, except part of its filler is used for xregs extension. A new flag UC_XREGS (0x10) for the uc_flags field will be added. Any ucontext_t with this flag set is considered to have the new layout described in this PSARC case. Any ucontext_t with this flag not set in its uc_flags is considered to have the original layout and its uc_xrs field will be ignored. A data structure will be defined as follows for both 32-bit and 64-bit applications: #define XRS_ID 0x00737278 /* the string xrs */ typedef struct { unsigned long xrs_id; caddr_t xrs_ptr; } xrs_t; Field xrs_id must have the value XRS_ID (little endian), and xrs_ptr will point to a prxregset_t data structure. Part of uc_filler in current ucontext_t definition will be used to store xrs_t. The new definition of ucontext_t is: typedef struct ucontext {
Re: Intel AVX Support [PSARC/2010/311 FastTrack timeout 08/11/2010]
Anyone have any comments or a +1 for the proposal? /kuriakose On 08/04/10 11:16, Sherry Moore wrote: I am sponsoring the following fast-track for Lejun Zhu and Kuriakose Kuruvilla. It requests a patch/micro binding. Man pages with change bars are available in the materials directory. Timeout is set to 8/11/2010. Thanks, Sherry Template Version: @(#)sac_nextcase 1.70 03/30/10 SMI This information is Copyright (c) 2010, Oracle and/or its affiliates. All rights reserved. 1. Introduction 1.1. Project/Component Working Name: Support Intel Advanced Vector Extensions (AVX) in Solaris 1.2. Name of Document Author/Supplier: Lejun Zhu Kuriakose Kuruvilla 1.3. Date of This Document: Jul 14th, 2010 1.4. Name of Major Document Customer(s)/Consumer(s): 1.4.1. The Community you expect to review your project: 1.4.2. The ARC(s) you expect to review your project: // Leave blank if you don't have any preference // This item is advisory only 1.5. Email Aliases: 1.5.2. Responsible Engineer: lejun@intel.com kuriakose.kuruvi...@oracle.com 1.5.4. Interest List: intel-c...@sun.com 2. Project Summary 2.1. Project Description: Intel Advanced Vector Extensions (AVX) introduces new instructions that accelerate vector floating point operations. AVX uses 256-bit registers, which requires extension of current Solaris interfaces that manipulate FPU registers, such as signal stack layout, setcontext syscall and /proc interface. 2.2. Risks and Assumptions: When extending Solaris interfaces and/or data structures to support AVX, it is very important to provide binary compatibility for existing applications. All application binaries that exist today will continue to run on new Solaris kernel without having to be recompiled. The only restriction for existing binaries is that they have enough space on the signal stack to hold the extra state (see 4.1.2 for details). 3. Business Summary 3.1. Problem Area: Intel AVX is a new 256-bit SIMD FP vector extension of Intel Architecture. Its introduction is targeted for the next Intel Microarchitecture (code named: Sandy Bridge). Intel AVX accelerates the trends towards FP intensive computation in general purpose applications like image, video, and audio processing, engineering applications such as 3D modeling and analysis, scientific simulation, and financial analytics. 3.2. Market/Requester: 3.3. Business Justification: Customers who use Solaris x86 will expect to run optimized applications on Sandy Bridge and future generations of Intel CPU, and many optimizations will use AVX instructions, such as Basic Linear Algebra Subprograms (BLAS) with DGEMM Routine, or sequential and cluster FFTs. Also, the amd64 ABI has already supported YMM registers. Latest GCC can generate AVX instructions, and an AVX-enabled Sun Studio compiler is being developed. All of these will require kernel changes to support AVX. 3.4. Competitive Analysis: Support for XSAVE and YMM has already been implemented in Linux kernel. 3.5. Opportunity Window/Exposure: Intel will support AVX instructions in the next generation Intel Microarchitecture (code-named: Sandy Bridge). Applications optimized for Sandy Bridge will emerge soon. In order to enable these optimizations on Solaris, we need to get the OS support into ON as soon as possible. 3.6. How will you know when you are done?: Applications can run correctly and use YMM registers on Intel machines that support AVX/YMM registers. 4. Technical Description: 4.1. Details: 4.1.1 Extending ucontext_t Structure ucontext_t will have the same size as its previous version and all existing fields will be at the same byte offset, except part of its filler is used for xregs extension. A new flag UC_XREGS (0x10) for the uc_flags field will be added. Any ucontext_t with this flag set is considered to have the new layout described in this PSARC case. Any ucontext_t with this flag not set in its uc_flags is considered to have the original layout and its uc_xrs field will be ignored. A data structure will be defined as follows for both 32-bit and 64-bit applications: #define XRS_ID 0x00737278 /* the string xrs */ typedef struct { unsigned long xrs_id; caddr_t xrs_ptr; } xrs_t; Field xrs_id must have the value XRS_ID (little endian), and xrs_ptr will point to a prxregset_t data structure. Part of uc_filler in current ucontext_t definition will be used to store xrs_t. The new definition of ucontext_t is: typedef struct ucontext { unsigned long uc_flags; ucontext_t *uc_link; sigset_tuc_sigmask;
Re: Intel AVX Support [PSARC/2010/311 FastTrack timeout 08/11/2010]
Kuriakose Kuruvilla wrote: %xmm0 0x5f4d4d585f4d4d585f4d4d585f4d4d58 %ymm0 0x5f4d4d595f4d4d595f4d4d595f4d4d595f4d4d585f4d4d585f4d4d585f4d4d58 This is probably more of a code review comment than an architectural issue, but I would expect mdb $x to print out _either_ the xmm* set _or_ the ymm* set, and not both. The xmm* set is just a portion of the ymm* data, so there's no need to print it twice. (This is for the same reason we don't print al, ax, eax, and rax all at once, but rather just the largest available.) -- James Carlson 42.703N 71.076W carls...@workingcode.com ___ opensolaris-arc mailing list opensolaris-arc@opensolaris.org
Re: Intel AVX Support [PSARC/2010/311 FastTrack timeout 08/11/2010]
On 08/10/10 10:41 AM, James Carlson wrote: Kuriakose Kuruvilla wrote: %xmm0 0x5f4d4d585f4d4d585f4d4d585f4d4d58 %ymm0 0x5f4d4d595f4d4d595f4d4d595f4d4d595f4d4d585f4d4d585f4d4d585f4d4d58 This is probably more of a code review comment than an architectural issue, but I would expect mdb $x to print out _either_ the xmm* set _or_ the ymm* set, and not both. The xmm* set is just a portion of the ymm* data, so there's no need to print it twice. (This is for the same reason we don't print al, ax, eax, and rax all at once, but rather just the largest available.) You are right, James. We will change the mdb output to print only the YMMs when they are present. Please let me know if you have any other comments. Thanks /kuriakose ___ opensolaris-arc mailing list opensolaris-arc@opensolaris.org