Re: Intel AVX Support [PSARC/2010/311 FastTrack timeout 08/11/2010]

2010-08-12 Thread Sherry Moore
This case has timed out and has a +1.  I'm closing as approved.

Thanks,
Sherry

On Wed, Aug 04, 2010 at 11:16:18AM -0700, Sherry Moore wrote:
 I am sponsoring the following fast-track for Lejun Zhu and Kuriakose
 Kuruvilla.  It requests a patch/micro binding.  Man pages with change
 bars are available in the materials directory.  Timeout is set to
 8/11/2010.
 
 Thanks,
 Sherry
 
 Template Version: @(#)sac_nextcase 1.70 03/30/10 SMI
 This information is Copyright (c) 2010, Oracle and/or its affiliates. All 
 rights reserved.
 1. Introduction
1.1. Project/Component Working Name:
 Support Intel Advanced Vector Extensions (AVX) in Solaris
 
1.2. Name of Document Author/Supplier:
 Lejun Zhu
 Kuriakose Kuruvilla
 
1.3. Date of This Document:
 Jul 14th, 2010
 
1.4. Name of Major Document Customer(s)/Consumer(s):
 1.4.1. The Community you expect to review your project:
 1.4.2. The ARC(s) you expect to review your project:
 // Leave blank if you don't have any preference
 // This item is advisory only
 
1.5. Email Aliases:
 1.5.2. Responsible Engineer:
   lejun@intel.com
   kuriakose.kuruvi...@oracle.com
 1.5.4. Interest List: intel-c...@sun.com
 
 2. Project Summary
2.1. Project Description:
 Intel Advanced Vector Extensions (AVX) introduces new instructions 
 that accelerate vector floating point operations. AVX uses 256-bit 
 registers, which requires extension of current Solaris interfaces that 
 manipulate FPU registers, such as signal stack layout, setcontext 
 syscall and /proc interface.
 
2.2. Risks and Assumptions:
 When extending Solaris interfaces and/or data structures to 
 support AVX, it is very important to provide binary compatibility for 
 existing applications. All application binaries that exist today will 
 continue to run on new Solaris kernel without having to be recompiled. The 
 only restriction for existing binaries is that they have enough space on 
 the signal stack to hold the extra state (see 4.1.2 for details).
 
 3. Business Summary
3.1. Problem Area:
 Intel AVX is a new 256-bit SIMD FP vector extension of Intel 
 Architecture. Its introduction is targeted for the next Intel 
 Microarchitecture (code named: Sandy Bridge). Intel AVX accelerates the 
 trends towards FP intensive computation in general purpose applications 
 like image, video, and audio processing, engineering applications such as 
 3D modeling and analysis, scientific simulation, and financial analytics.
 
3.2. Market/Requester:
 
3.3. Business Justification:
 Customers who use Solaris x86 will expect to run optimized 
 applications on Sandy Bridge and future generations of Intel CPU, and many 
 optimizations will use AVX instructions, such as Basic Linear Algebra 
 Subprograms (BLAS) with DGEMM Routine, or sequential and cluster FFTs. 
 Also, the amd64 ABI has already supported YMM registers. Latest GCC can 
 generate AVX instructions, and an AVX-enabled Sun Studio compiler is being 
 developed. All of these will require kernel changes to support AVX.
 
3.4. Competitive Analysis:
 Support for XSAVE and YMM has already been implemented in Linux 
 kernel.
 
3.5. Opportunity Window/Exposure:
 Intel will support AVX instructions in the next generation Intel 
 Microarchitecture (code-named: Sandy Bridge). Applications optimized for 
 Sandy Bridge will emerge soon. In order to enable these optimizations on 
 Solaris, we need to get the OS support into ON as soon as possible.
 
3.6. How will you know when you are done?:
   Applications can run correctly and use YMM registers on Intel machines
 that support AVX/YMM registers.
 
 4. Technical Description:
 4.1. Details:
 4.1.1 Extending ucontext_t
 Structure ucontext_t will have the same size as its previous 
 version and all existing fields will be at the same byte offset, except 
 part of its filler is used for xregs extension. A new flag UC_XREGS (0x10) 
 for the uc_flags field will be added. Any ucontext_t with this flag set is 
 considered to have the new layout described in this PSARC case. Any 
 ucontext_t with this flag not set in its uc_flags is considered to have 
 the original layout and its uc_xrs field will be ignored.
 
 A data structure will be defined as follows for both 32-bit 
 and 64-bit applications:
 
 #define XRS_ID  0x00737278 /* the string xrs */
 
 typedef struct {
 unsigned long xrs_id;
 caddr_t xrs_ptr;
 } xrs_t;
 
 Field xrs_id must have the value XRS_ID (little endian), and 
 xrs_ptr will point to a prxregset_t data structure.
 
 Part of uc_filler in current ucontext_t definition will be 
 used to store xrs_t. The new definition of ucontext_t is:
 
 typedef struct  ucontext {
 

Re: Intel AVX Support [PSARC/2010/311 FastTrack timeout 08/11/2010]

2010-08-10 Thread Kuriakose Kuruvilla

Anyone have any comments or a +1 for the proposal?

/kuriakose

On 08/04/10 11:16, Sherry Moore wrote:

I am sponsoring the following fast-track for Lejun Zhu and Kuriakose
Kuruvilla.  It requests a patch/micro binding.  Man pages with change
bars are available in the materials directory.  Timeout is set to
8/11/2010.

Thanks,
Sherry

Template Version: @(#)sac_nextcase 1.70 03/30/10 SMI
This information is Copyright (c) 2010, Oracle and/or its affiliates. All 
rights reserved.
1. Introduction
1.1. Project/Component Working Name:
 Support Intel Advanced Vector Extensions (AVX) in Solaris

1.2. Name of Document Author/Supplier:
 Lejun Zhu
 Kuriakose Kuruvilla

1.3. Date of This Document:
 Jul 14th, 2010

1.4. Name of Major Document Customer(s)/Consumer(s):
 1.4.1. The Community you expect to review your project:
 1.4.2. The ARC(s) you expect to review your project:
 // Leave blank if you don't have any preference
 // This item is advisory only

1.5. Email Aliases:
 1.5.2. Responsible Engineer:
lejun@intel.com
kuriakose.kuruvi...@oracle.com
 1.5.4. Interest List: intel-c...@sun.com

2. Project Summary
2.1. Project Description:
 Intel Advanced Vector Extensions (AVX) introduces new instructions
that accelerate vector floating point operations. AVX uses 256-bit
registers, which requires extension of current Solaris interfaces that
manipulate FPU registers, such as signal stack layout, setcontext
syscall and /proc interface.

2.2. Risks and Assumptions:
 When extending Solaris interfaces and/or data structures to
support AVX, it is very important to provide binary compatibility for
existing applications. All application binaries that exist today will
continue to run on new Solaris kernel without having to be recompiled. The
only restriction for existing binaries is that they have enough space on
the signal stack to hold the extra state (see 4.1.2 for details).

3. Business Summary
3.1. Problem Area:
 Intel AVX is a new 256-bit SIMD FP vector extension of Intel
Architecture. Its introduction is targeted for the next Intel
Microarchitecture (code named: Sandy Bridge). Intel AVX accelerates the
trends towards FP intensive computation in general purpose applications
like image, video, and audio processing, engineering applications such as
3D modeling and analysis, scientific simulation, and financial analytics.

3.2. Market/Requester:

3.3. Business Justification:
 Customers who use Solaris x86 will expect to run optimized
applications on Sandy Bridge and future generations of Intel CPU, and many
optimizations will use AVX instructions, such as Basic Linear Algebra
Subprograms (BLAS) with DGEMM Routine, or sequential and cluster FFTs.
Also, the amd64 ABI has already supported YMM registers. Latest GCC can
generate AVX instructions, and an AVX-enabled Sun Studio compiler is being
developed. All of these will require kernel changes to support AVX.

3.4. Competitive Analysis:
 Support for XSAVE and YMM has already been implemented in Linux kernel.

3.5. Opportunity Window/Exposure:
 Intel will support AVX instructions in the next generation Intel
Microarchitecture (code-named: Sandy Bridge). Applications optimized for
Sandy Bridge will emerge soon. In order to enable these optimizations on
Solaris, we need to get the OS support into ON as soon as possible.

3.6. How will you know when you are done?:
Applications can run correctly and use YMM registers on Intel machines
that support AVX/YMM registers.

4. Technical Description:
 4.1. Details:
 4.1.1 Extending ucontext_t
 Structure ucontext_t will have the same size as its previous
version and all existing fields will be at the same byte offset, except
part of its filler is used for xregs extension. A new flag UC_XREGS (0x10)
for the uc_flags field will be added. Any ucontext_t with this flag set is
considered to have the new layout described in this PSARC case. Any
ucontext_t with this flag not set in its uc_flags is considered to have
the original layout and its uc_xrs field will be ignored.

 A data structure will be defined as follows for both 32-bit
and 64-bit applications:

 #define XRS_ID  0x00737278 /* the string xrs */

 typedef struct {
 unsigned long xrs_id;
 caddr_t xrs_ptr;
 } xrs_t;

 Field xrs_id must have the value XRS_ID (little endian), and
xrs_ptr will point to a prxregset_t data structure.

 Part of uc_filler in current ucontext_t definition will be
used to store xrs_t. The new definition of ucontext_t is:

 typedef struct  ucontext {
 unsigned long   uc_flags;
 ucontext_t  *uc_link;
 sigset_tuc_sigmask;
  

Re: Intel AVX Support [PSARC/2010/311 FastTrack timeout 08/11/2010]

2010-08-10 Thread James Carlson
Kuriakose Kuruvilla wrote:
  %xmm0  0x5f4d4d585f4d4d585f4d4d585f4d4d58
  %ymm0 
 0x5f4d4d595f4d4d595f4d4d595f4d4d595f4d4d585f4d4d585f4d4d585f4d4d58

This is probably more of a code review comment than an architectural
issue, but I would expect mdb $x to print out _either_ the xmm* set _or_
the ymm* set, and not both.  The xmm* set is just a portion of the ymm*
data, so there's no need to print it twice.  (This is for the same
reason we don't print al, ax, eax, and rax all at once, but rather just
the largest available.)

-- 
James Carlson 42.703N 71.076W carls...@workingcode.com
___
opensolaris-arc mailing list
opensolaris-arc@opensolaris.org


Re: Intel AVX Support [PSARC/2010/311 FastTrack timeout 08/11/2010]

2010-08-10 Thread Kuriakose Kuruvilla

On 08/10/10 10:41 AM, James Carlson wrote:

Kuriakose Kuruvilla wrote:

  %xmm0  0x5f4d4d585f4d4d585f4d4d585f4d4d58
  %ymm0
0x5f4d4d595f4d4d595f4d4d595f4d4d595f4d4d585f4d4d585f4d4d585f4d4d58


This is probably more of a code review comment than an architectural
issue, but I would expect mdb $x to print out _either_ the xmm* set _or_
the ymm* set, and not both.  The xmm* set is just a portion of the ymm*
data, so there's no need to print it twice.  (This is for the same
reason we don't print al, ax, eax, and rax all at once, but rather just
the largest available.)



You are right, James.

We will change the mdb output to print only the YMMs when they are present.

Please let me know if you have any other comments.

Thanks
/kuriakose
___
opensolaris-arc mailing list
opensolaris-arc@opensolaris.org