This IPDPS 2009 paper by Nadathur Satish, Mark Harris, and Michael Garland describes the design of high-performance parallel radix sort and merge sort routines for manycore GPUs, taking advantage of the full programmability offered by NVIDIA CUDA. The radix sort described is the fastest GPU sort and the merge sort described is the fastest comparison-based GPU sort reported in the literature. The radix sort is up to 4 times faster than the graphics-based GPUSort and greater than 2 times faster than other CUDA-based radix sorts. It is also 23% faster, on average, than even a very carefully optimized multicore CPU sorting routine. To achieve this performance, the authors carefully design the algorithms to expose substantial fine-grained parallelism and decompose the computation into independent tasks that perform minimal global communication. They exploit the high-speed on-chip shared memory provided by NVIDIA’s GPU architecture and efficient data-parallel primitives, particularly parallel scan. While targeted at GPUs, these algorithms should also be well-suited for other manycore processors. (N. Satish, M. Harris, and M. Garland. Designing efficient sorting algorithms for manycore GPUs. Proc. 23rd IEEE Int’l Parallel & Distributed Processing Symposium, May 2009. To appear.)

Posted: 01 Mar 2009 [GPGPU /Data Parallel Algorithms] #

High-Performance Graphics Call for Participation

The new High-Performance Graphics Conference is the synthesis of two highly-successful conference series:

Graphics Hardware, an annual conference focusing on graphics hardware, architecture, and systems since 1986, and
Interactive Ray Tracing, an innovative conference series focusing on the emerging field of interactive ray tracing since 2006.

By combining these two conferences, High-Performance Graphics aims to bring to authors and attendees the best of both, while extending the scope of the new conference to cover the overarching field of performance-oriented graphics systems covering innovative algorithms, efficient implementations, and hardware architecture. This broader focus offers a common forum bringing together researchers, engineers, and architects to discuss the complex interactions of massively parallel hardware, novel programming models, efficient graphics algorithms, and innovative applications. Paper submissions are due April 30th. For more information see the High-Performance Graphics Website.

Posted: 27 Feb 2009 [GPGPU /Conferences] #

Java bindings for CUDA

Alexander Heusel of the University of Frankfurt has released open source Java bindings for CUDA. The current project state is alpha, with support for the CUDA driver API, and support for the CUBLAS and CUFFT libraries is pending. Contributions are welcome. For more information see the project website: http://jacuzzi.sourceforge.net

Posted: 27 Feb 2009 [GPGPU /Tools] #

HotPar '09: First USENIX Workshop on Hot Topics in Parallelism

To be held March 30-31, 2009 in Berkeley, California, HotPar '09 will bring together researchers and practitioners doing innovative work in the area of parallel computing. HotPar recognizes the broad impact of multicore computing and seeks relevant contributions from all fields, including application design, languages and compilers, systems, and architecture. (http://www.usenix.org/events/hotpar09/)

Posted: 27 Feb 2009 [GPGPU /Conferences] #

gDEBugger V4.5 Adds the ability to view Texture Mipmap levels and Texture Arrays

The new gDEBugger V4.5 adds the ability to view texture MIP-map levels. Each texture MIP-map level’s parameters and data (as an image or raw data) can be displayed in the gDEBugger Texture and Buffers viewer. Browse the different MIP-map levels using the Texture MIP-map Level slidergDEBugger V4.5 also introduces support for 1D and 2D texture arrays. The new Textures and Buffers viewer Texture Layer slider enables viewing the contents of different texture layers. This version also introduces notable performance and stability improvements. gDEBugger, an OpenGL and OpenGL ES debugger and profiler, traces application activity on top of the OpenGL API and lets programmers see what is happening within the graphics system implementation to find bugs and optimize OpenGL application performance. gDEBugger runs on Windows and Linux operating systems, and is currently in Beta phase on Mac OS X. http://www.gremedy.com

Posted: 27 Feb 2009 [GPGPU /Tools] #

OpenMM Molecular Dynamics Simulation Software with GPU Acceleration Released by Standford University

OpenMM is a freely downloadable, high performance, extensible library that allows molecular dynamics (MD) simulations to run on high performance computer architectures, such as graphics processing units (GPUs). Significant performance speedups of 100 times were achieved in some cases by running OpenMM on GPUs in desktop PCs (vs CPU). The new release includes a version of the widely used MD package GROMACS that integrates the OpenMM library, enabling acceleration on high-end NVIDIA and AMD/ATI GPUs. OpenMM is a collaborative project between Vijay Pande's lab at Stanford University and Simbios, the National Center for Physics-based Simulation of Biological Structures at Stanford, which is supported by the National Institutes of Health. For more information on OpenMM, go to http://simtk.org/home/openmm. (Full press release.)

Posted: 27 Feb 2009 [GPGPU /Scientific Computing] #

CUDA.NET 2.1 Released

CUDA.NET 2.1 has been released with support for the NVIDIA CUDA 2.1 API. This version supports DirectX 10 interoperability and the new JIT compilation API. The library is supported on Windows and Linux operating systems. (CUDA.NET)

Posted: 27 Feb 2009 [GPGPU /Tools] #

WORKSHOP on GPU Supercomputing 2009, National Taiwan University

The first NTU workshop on GPU supercomputing was held at NTU on January 16, 2009. Organized by the Center for Quantum Science and Engineering (CQSE) at National Taiwan University, This workshop consisted of seminars on applications of GPU/CUDA in high performance computations in science and engineering, as well as other fields. Slides from the presentations are now online.

Posted: 03 Feb 2009 [GPGPU /Conferences] #

February is "Fold For Stephanie Month" (fold...@home)

Scott Sherman from Bjorn3D is holding a "Fold for Stephanie" month in support of his 13-year-old daughter who has Hodgkins stage 4B cancer. He is even giving away an XFX NVIDIA GeForce GTX 285 GPU to the highest folder for Stephanie. For more information, see the Bjorn 3D Forums.

Posted: 03 Feb 2009 [GPGPU /Contests] #

The Need for Speed Seminar Series: David Kirk Keynote

The University of Illinois at Urbana-Champaign is launching a 13-week seminar series that will focus on emerging applications for parallel computing. The Need for Speed Seminar Series will feature world-class applications experts and researchers who will discuss what increased computing performance means for their fields. The series will bring together hardware engineers and software developers who require parallel processing to create faster and superior applications. Speakers will help forecast breakthroughs enabled by the rapid advances in computing performance per dollar, performance per watt, or storage capacity provided by Moore's Law.

David Kirk, NVIDIA Fellow, will kick off the series with a special keynote on January 28. Following that, the Need for Speed series will be held at 4pm CT every Wednesday until April 29 at the UI's Coordinated Science Laboratory. Seminars will also stream live over the internet and speakers will take questions from both in-house and online audience members. To learn more about the series, or to view the live seminars, please visit the Need for Speed seminar web page.

(Editor's Note: this news was submitted after the talk occurred.)

Posted: 03 Feb 2009 [GPGPU /Miscellaneous/Talks] #

Webinar: Jacket: Accelerating MATLAB using CUDA-Enabled GPUs

February 5, 2009, 11am PST / 2pm EST

Are you looking for ways to improve your productivity by accelerating MATLAB functions? Now you can with the unprecedented performance of GPU computing.

By attending this webinar, you will learn:

What is GPU computing
What is NVIDIA CUDA parallel computing architecture
What is the Jacket engine for MATLAB from AccelerEyes
How to get 10x to 50x speed-up for several MATLAB functions

Date: Thursday, February 5, 2009
Time: 11:00am PST / 2:00pm EST
Duration: 45 Minute Presentation, 15 Minute Q&A
Register Here
Presented By: Sumit Gupta, Ph.D., Sr Product Manager of Tesla GPU Computing at NVIDIA and John Melonakos, Ph.D., CEO at AccelerEyes LLC

Posted: 03 Feb 2009 [GPGPU /Miscellaneous/Courses] #

National Taiwan University Becomes Worlds First Asia-Pacific CUDA Center of Excellence

NVIDIA announced that National Taiwan University has been named as Asia's first CUDA Center of Excellence (press release below). The university earned this title by formally adopting NVIDIA GPU Computing solutions across its research facilities and integrating a class to teach parallel computing based on the CUDA architecture into its educational curriculum. As the computing industry rapidly moves toward parallel processing and many-core architectures, over the past year, NVIDIA has worked to offer tomorrow's developers and engineers education on the best tools and methodologies for parallel computing. In addition to working with over 50 Universities worldwide that are actively using CUDA in their courses, NVIDIA developed the CUDA Center of Excellence Program to further assist universities that are devoted to educating tomorrow's software developers about parallel computing. (Press Release)

Posted: 22 Jan 2009 [GPGPU /Press] #

Wipro to Offer CUDA Software Services to Global Customer Base

From a press release:

SANTA CLARA, CA—JANUARY 15, 2009—NVIDIA today announced it is now working closely with Wipro to provide CUDA™ professional services to their joint customers worldwide. CUDA, NVIDIA’s parallel computing architecture accessible through an industry standard C language programming environment, has already delivered major leaps in performance across many industries. Wipro’s Product Engineering Services group will accelerate the development efforts of companies with vast software portfolios seeking to exploit parallel computing with the GPU.

(Read More)

Posted: 22 Jan 2009 [GPGPU /Press] #

Symposium on Application Accelerators in High Performance Computing (SAAHPC’09)

What do GPUs, FPGAs, vector processors and other special-purpose chips have in common? They are examples of advanced processor architectures that the scientific community is using to accelerate computationally demanding applications. While high-performance computing systems that use application accelerators are still rare, they will be the norm rather than the exception in the near future. The 2009 Symposium on Application Accelerators in High-Performance Computing aims to bring together developers of computing accelerators and end-users of the technology to exchange ideas and learn about the latest developments in the field. The Symposium will focus on the use of application accelerators in high-performance and scientific computing and issues that surround it. Topics of interest include:

novel accelerator processors, systems, and architectures
integration of accelerators with high-performance computing systems
programming models for accelerator-based computing
languages and compilers for accelerator-based computing
run-time environments, profiling and debugging tools for accelerator-based computing
scientific and engineering applications that use application accelerators

Presentations from technology developers and the academic user community are invited. Researchers interested in presenting at the Symposium should submit extended abstracts of 2-3 pages to [email protected] by April 20, 2009. All submissions will be reviewed by the Technical Program Committee and accepted submissions will be presented as either oral presentations or posters. Presentation materials will be made available online at www.saahpc.org.

(2009 Symposium on Application Accelerators in High Performance Computing (SAAHPC’09). July 27-31, 2009, University of Illinois, Urbana, IL)

Posted: 22 Jan 2009 [GPGPU /Conferences] #

gDEBugger for Apple Mac OS X - Beta Program

Graphic Remedy is proud to announce the upcoming release of gDEBugger for Mac OS X. This new product brings all of gDEBugger's Debugging and Profiling abilities to the Mac OpenGL developer's world. Using gDEBugger Mac will help OS X OpenGL developers optimize their application performance: find graphics pipeline bottlenecks, improve application graphics memory consumption, locate and remove redundant OpenGL calls and graphics memory leaks, and much more. Visit the gDebuggerMac home page to join the Beta Program, see screenshots and get more details.

gDEBugger, an OpenGL and OpenGL ES debugger and profiler, traces application activity on top of the OpenGL API, and lets programmers see what is happening within the graphics system implementation to find bugs and optimize OpenGL application performance. gDEBugger runs on Windows, Linux and Mac OS X operating systems.

Posted: 22 Jan 2009 [GPGPU /Tools] #

Experience with the GPU and the Cell Processor

This workshop, to be held at TU Delft on Friday January 30, 2009, presents state-of-the-art performance results for engineering applications on parallel machines, based on either the Cell Processor or on GPUs. Next to iterative solvers, finite element applications, tomography and visualization applications, some background information on computation on these platforms and coupling of processors will be shown. To attend this workshop is free, registration is required. (Workshop: Experience with the GPU and the Cell Processor)

Posted: 22 Jan 2009 [GPGPU /Conferences] #

Workshop on Exploiting Parallelism using GPUs and other Hardware-Assisted Methods (EPHAM 2009)

This workshop will focus on compilation techniques for exploiting parallelism in emerging massively multi-threaded and multi-core architectures, with a particular focus on the use of general-purpose GPU computing techniques to overcome traditional barriers to parallelization. Recently, GPUs have evolved to address programming of general-purpose computations, especially those exemplified by data-parallel models. This change will have long-term implications for languages, compilers, and programming models. Development of higher-level programming languages, models and compilers that exploit such processors will be important. Clearly, the economics and performance of applications is affected by a transition to general-purpose GPU computing. This will require new ideas and directions as well as recasting some older techniques to the new paradigm.

EPHAM 2009 invites papers in this emerging discipline which include, but are not limited, to the following areas of interest.

Static and dynamic parallelization for hybrid CPU/GPU systems
Compiler optimizations for GPU computing
Language constructs and extensions to enable parallel programming with GPUs
Run-time techniques to off-load computation to the GPU
Language, programming model, or compiler techniques for mapping irregular computations to GPUs
Debugging support for GPU programs
Performance analysis tools related to GPU computing
Other hardware-assisted methods for extracting and exploiting parallelism

Please find more information at the EPHAM 2009 workshop website.

Posted: 11 Jan 2009 [GPGPU /Conferences] #

"Parallel Computing for Graphics: Beyond Programmable Shading" SIGGRAPH Asia 2008 Course

The complete course notes from the "Parallel Computing for Graphics: Beyond Programmable Shading" SIGGRAPH Asia 2008 course , are available online. The course gives an introduction to parallel programming architectures and environments for interactive graphics and explores case studies of combining traditional rendering API usage with advanced parallel computation from game developers, researchers, and graphics hardware vendors. There are strong indications that the future of interactive graphics involves a programming model more flexible than today's OpenGL and Direct3D pipelines. As such, graphics developers need a basic understanding of how to combine emerging parallel programming techniques with the traditional interactive rendering pipeline. This course gives an introduction to several parallel graphics architectures and programming environments, and introduces the new types of graphics algorithms that will be possible. The case studies in the class discuss the mix of parallel programming constructs used, details of the graphics algorithms, and how the rendering pipeline and computation interact to achieve the technical goals. The course speakers are Jason Yang and Justin Hensley (AMD), Tim Foley (Intel), Mark Harris (NVIDIA), Kun Zhou (Zhejiang University), Anjul Patney (UC Davis), Pedro Sander (HKUIST), and Christopher Oat (AMD) (Complete course notes.)

Posted: 23 Dec 2008 [GPGPU /Miscellaneous/Courses] #

NVIDIA Releases Version 2.1 Beta of the CUDA Toolkit and SDK

DECEMBER 19, 2008- NVIDIA has announced the availability of version 2.1 beta of its CUDA toolkit and SDK. This is the latest version of the C-compiler and software development tools for accessing the massively parallel CUDA compute architecture of NVIDIA GPUs. In response to overwhelming demand from the developer community, this latest version of the CUDA software suite includes support for NVIDIA®® Tesla™ GPUs on Windows Vista and 32-bit debugger support for CUDA on RedHat Enterprise Linux 5.x (separate download).

The CUDA Toolkit and SDK 2.1 beta includes support for VisualStudio 2008 support on Windows XP and Vista and Just-In-Time (JIT) compilation for applications that dynamically generate CUDA kernels. Several new interoperability APIs have been added for Direct3D 9 and Direct3D 10 that accelerate communication to DirectX applications as well as a series of improvements to OpenGL interoperability.

CUDA Toolkit and SDK 2.1 beta also features support for using a GPU that is not driving a display on Vista, a beta of Linux Profiler 1.1 (separate download) as well as support for recent releases of Linux including Fedora9, OpenSUSE 11 and Ubuntu 8.04.

CUDA Toolkit and SDK 2.1 beta is available today for free download from www.nvidia.com/object/cuda_get.

Posted: 23 Dec 2008 [GPGPU /High-Level Languages] #

Wait-free programming for general purpose computations on graphics processors

Abstract:

This paper aims at bridging the gap between the lack of synchronization mechanisms in recent graphics processor (GPU) architectures and the need of synchronization mechanisms in parallel applications. Based on the intrinsic features of recent GPU architectures, the authors construct strong synchronization objects like wait-free and t-resilient read-modify-write objects for a general model of recent GPU architectures without strong hardware synchronization primitives like test-and-set and compare-and-swap. Accesses to the new wait-free objects have time complexity O(N), where N is the number of concurrent processes. The wait-free objects have space complexity O(N²), which is optimal. Our result demonstrates that it is possible to construct wait-free synchronization mechanisms for GPUs without the need of strong synchronization primitives in hardware and that wait-free programming is possible for GPUs. (Wait-free programming for general purpose computations on graphics processors. Phuong Hoai Ha, Philippas Tsigas, and Otto J. Anshus. ACM Symposium on Principles of Distributed Computing, 2008.)

[linuxkernelnewbies] GPGPU

Reply via email to