|
http://www.gpgpu.org/ Designing Efficient Sorting Algorithms for Manycore GPUs This
IPDPS 2009 paper by Nadathur Satish, Mark Harris, and Michael Garland
describes the design of high-performance parallel radix sort and merge
sort routines for manycore GPUs, taking advantage of the full
programmability offered by NVIDIA
CUDA.
The radix sort described is the fastest GPU sort and the merge sort
described is the fastest comparison-based GPU sort reported in the
literature. The radix sort is up to 4 times faster than the
graphics-based GPUSort and greater than 2 times faster than other
CUDA-based radix sorts. It is also 23% faster, on average, than even a
very carefully optimized multicore CPU sorting routine. To achieve this
performance, the authors carefully design the algorithms to expose
substantial fine-grained parallelism and decompose the computation into
independent tasks that perform minimal global communication. They
exploit the high-speed on-chip shared memory provided by NVIDIA’s GPU
architecture and efficient data-parallel primitives, particularly
parallel scan. While targeted at GPUs, these algorithms should also be
well-suited for other manycore processors. (N. Satish, M. Harris, and
M. Garland. Designing
efficient sorting algorithms for manycore GPUs. Proc. 23rd IEEE
Int’l Parallel & Distributed Processing Symposium, May 2009. To
appear.)
Posted: 01 Mar 2009 [GPGPU /Data Parallel Algorithms] # High-Performance Graphics Call for Participation The
new High-Performance Graphics Conference is the synthesis of two
highly-successful conference series:
Posted: 27 Feb 2009 [GPGPU /Conferences] # Alexander
Heusel of the University of
Frankfurt has released open source Java bindings for CUDA. The current
project state is alpha, with support for the CUDA driver API, and
support for the CUBLAS and CUFFT libraries is pending. Contributions
are welcome. For more information see the project website: http://jacuzzi.sourceforge.net
Posted: 27 Feb 2009 [GPGPU /Tools] # HotPar '09: First USENIX Workshop on Hot Topics in Parallelism To
be held March 30-31, 2009 in Berkeley,
California, HotPar '09 will bring together researchers and
practitioners doing innovative work in the area of parallel computing.
HotPar recognizes the broad impact of multicore computing and seeks
relevant contributions from all fields, including application design,
languages and compilers, systems, and architecture. (http://www.usenix.org/events/hotpar09/)
Posted: 27 Feb 2009 [GPGPU /Conferences] # gDEBugger V4.5 Adds the ability to view Texture Mipmap levels and Texture Arrays The
new gDEBugger V4.5 adds the ability to
view texture MIP-map levels. Each texture MIP-map level’s parameters
and data (as an image or raw data) can be displayed in the gDEBugger
Texture and Buffers viewer. Browse the different MIP-map levels using
the Texture MIP-map Level slidergDEBugger V4.5 also introduces support
for 1D and 2D texture arrays. The new Textures and Buffers viewer
Texture Layer slider enables viewing the contents of different texture
layers. This version also introduces notable performance and stability
improvements.
gDEBugger, an OpenGL and OpenGL ES debugger and profiler, traces
application activity on top of the OpenGL API and lets programmers see
what is happening within the graphics system implementation to find
bugs and optimize OpenGL application performance. gDEBugger runs on
Windows and Linux operating systems, and is currently in Beta phase on
Mac OS X.
http://www.gremedy.com
Posted: 27 Feb 2009 [GPGPU /Tools] # OpenMM Molecular Dynamics Simulation Software with GPU Acceleration Released by Standford University OpenMM
is a freely downloadable, high
performance, extensible library that allows molecular dynamics (MD)
simulations to run on high performance computer architectures, such as
graphics processing units (GPUs). Significant performance speedups of
100 times were achieved in some cases by running OpenMM on GPUs in
desktop PCs (vs CPU). The new release includes a version of the widely
used MD package GROMACS that integrates the OpenMM library, enabling
acceleration on high-end NVIDIA and AMD/ATI GPUs. OpenMM is a
collaborative project between Vijay Pande's lab at Stanford University
and Simbios, the National Center for Physics-based Simulation of
Biological Structures at Stanford, which is supported by the National
Institutes of Health. For more information on OpenMM, go to http://simtk.org/home/openmm. (Full
press release.)
Posted: 27 Feb 2009 [GPGPU /Scientific Computing] # CUDA.NET 2.1
has been released with support for the NVIDIA CUDA
2.1 API. This version supports DirectX 10 interoperability and the new
JIT compilation API. The library is supported on Windows and Linux
operating systems.
(CUDA.NET)
Posted: 27 Feb 2009 [GPGPU /Tools] # WORKSHOP on GPU Supercomputing 2009, National Taiwan University The
first NTU workshop
on GPU supercomputing
was held at NTU on January 16, 2009. Organized by the Center for
Quantum Science and Engineering (CQSE) at National Taiwan University,
This workshop consisted of seminars on applications of GPU/CUDA in high
performance computations in science and engineering, as well as other
fields. Slides from
the presentations are now online.
Posted: 03 Feb 2009 [GPGPU /Conferences] # February is "Fold For Stephanie Month" (fold...@home) Scott
Sherman from Bjorn3D
is holding a "Fold for Stephanie" month in support of his 13-year-old
daughter who has Hodgkins stage 4B cancer. He is even giving away an
XFX NVIDIA GeForce GTX 285 GPU to the highest folder for Stephanie. For
more information, see the Bjorn
3D Forums.
Posted: 03 Feb 2009 [GPGPU /Contests] # The Need for Speed Seminar Series: David Kirk Keynote The University of Illinois at Urbana-Champaign is launching a 13-week seminar series that will focus on emerging applications for parallel computing. The Need for Speed Seminar Series will feature world-class applications experts and researchers who will discuss what increased computing performance means for their fields. The series will bring together hardware engineers and software developers who require parallel processing to create faster and superior applications. Speakers will help forecast breakthroughs enabled by the rapid advances in computing performance per dollar, performance per watt, or storage capacity provided by Moore's Law. David Kirk, NVIDIA Fellow, will kick off the series with a special keynote on January 28. Following that, the Need for Speed series will be held at 4pm CT every Wednesday until April 29 at the UI's Coordinated Science Laboratory. Seminars will also stream live over the internet and speakers will take questions from both in-house and online audience members. To learn more about the series, or to view the live seminars, please visit the Need for Speed seminar web page. (Editor's Note: this news was submitted after the talk occurred.)Posted: 03 Feb 2009 [GPGPU /Miscellaneous/Talks] # Webinar: Jacket: Accelerating MATLAB using CUDA-Enabled GPUs February
5, 2009, 11am PST / 2pm EST
Are you looking for ways to improve your productivity by accelerating MATLAB functions? Now you can with the unprecedented performance of GPU computing. By attending this webinar, you will learn:
Date: Thursday, February 5, 2009 Posted: 03 Feb 2009 [GPGPU /Miscellaneous/Courses] # National Taiwan University Becomes Worlds First Asia-Pacific CUDA Center of Excellence NVIDIA
announced that National Taiwan
University has been named as Asia's first CUDA Center of Excellence
(press release below). The university earned this title by formally
adopting NVIDIA GPU Computing solutions across its research facilities
and integrating a class to teach parallel computing based on the CUDA
architecture into its educational curriculum. As the computing industry
rapidly moves toward parallel processing and many-core architectures,
over the past year, NVIDIA has worked to offer tomorrow's developers
and engineers education on the best tools and methodologies for
parallel computing. In addition to working with over 50 Universities
worldwide that are actively using CUDA in their courses, NVIDIA
developed the CUDA Center of Excellence Program to further assist
universities that are devoted to educating tomorrow's software
developers about parallel computing. (Press Release)
Posted: 22 Jan 2009 [GPGPU /Press] # Wipro to Offer CUDA Software Services to Global Customer Base From
a press
release:
SANTA CLARA, CA—JANUARY 15, 2009—NVIDIA today announced it is now working closely with Wipro to provide CUDA™ professional services to their joint customers worldwide. CUDA, NVIDIA’s parallel computing architecture accessible through an industry standard C language programming environment, has already delivered major leaps in performance across many industries. Wipro’s Product Engineering Services group will accelerate the development efforts of companies with vast software portfolios seeking to exploit parallel computing with the GPU.(Read More) Posted: 22 Jan 2009 [GPGPU /Press] # Symposium on Application Accelerators in High Performance Computing (SAAHPC’09) What do GPUs, FPGAs, vector processors and other special-purpose chips have in common? They are examples of advanced processor architectures that the scientific community is using to accelerate computationally demanding applications. While high-performance computing systems that use application accelerators are still rare, they will be the norm rather than the exception in the near future. The 2009 Symposium on Application Accelerators in High-Performance Computing aims to bring together developers of computing accelerators and end-users of the technology to exchange ideas and learn about the latest developments in the field. The Symposium will focus on the use of application accelerators in high-performance and scientific computing and issues that surround it. Topics of interest include:
Presentations from technology developers and the academic user community are invited. Researchers interested in presenting at the Symposium should submit extended abstracts of 2-3 pages to [email protected] by April 20, 2009. All submissions will be reviewed by the Technical Program Committee and accepted submissions will be presented as either oral presentations or posters. Presentation materials will be made available online at www.saahpc.org. (2009 Symposium on Application Accelerators in High Performance Computing (SAAHPC’09). July 27-31, 2009, University of Illinois, Urbana, IL)Posted: 22 Jan 2009 [GPGPU /Conferences] # gDEBugger for Apple Mac OS X - Beta Program Graphic Remedy
is proud to announce the upcoming release of gDEBugger for Mac OS X.
This new product brings all of gDEBugger's Debugging and Profiling
abilities to the Mac OpenGL developer's world. Using gDEBugger Mac will
help OS X OpenGL developers optimize their application performance:
find graphics pipeline bottlenecks, improve application graphics memory
consumption, locate and remove redundant OpenGL calls and graphics
memory leaks, and much more. Visit the gDebuggerMac home page
to join the Beta Program, see screenshots and get more details.
gDEBugger, an OpenGL and OpenGL ES debugger and profiler, traces application activity on top of the OpenGL API, and lets programmers see what is happening within the graphics system implementation to find bugs and optimize OpenGL application performance. gDEBugger runs on Windows, Linux and Mac OS X operating systems. Posted: 22 Jan 2009 [GPGPU /Tools] # Experience with the GPU and the Cell Processor This
workshop,
to be held at TU Delft on Friday January 30, 2009, presents
state-of-the-art performance results for engineering applications on
parallel machines, based on either the Cell Processor or on GPUs. Next
to iterative solvers, finite element applications, tomography and
visualization applications, some background information on computation
on these platforms and coupling of processors will be shown. To attend
this workshop is free, registration is required. (Workshop: Experience
with the GPU and the Cell Processor)
Posted: 22 Jan 2009 [GPGPU /Conferences] # Workshop on Exploiting Parallelism using GPUs and other Hardware-Assisted Methods (EPHAM 2009) This
workshop will focus on compilation
techniques for exploiting parallelism in emerging massively
multi-threaded and multi-core architectures, with a particular focus on
the use of general-purpose
GPU computing techniques to overcome traditional barriers to
parallelization. Recently, GPUs have evolved to address programming of
general-purpose computations, especially those exemplified by
data-parallel models. This change will have long-term implications for
languages, compilers, and programming models. Development of
higher-level programming languages, models and compilers that exploit
such processors will be important. Clearly, the economics and
performance of applications is affected by a transition to
general-purpose GPU computing. This will require new ideas and
directions as well as
recasting some older techniques to the new paradigm.
EPHAM 2009 invites papers in this emerging discipline which include, but are not limited, to the following areas of interest.
Posted: 11 Jan 2009 [GPGPU /Conferences] # "Parallel Computing for Graphics: Beyond Programmable Shading" SIGGRAPH Asia 2008 Course The
complete course notes from the "Parallel Computing for Graphics: Beyond
Programmable Shading" SIGGRAPH
Asia 2008
course , are available online. The course gives an introduction to
parallel programming architectures and environments for interactive
graphics and explores case studies of combining traditional rendering
API usage with advanced parallel computation from game developers,
researchers, and graphics hardware vendors. There are strong
indications that the future of interactive graphics involves a
programming model more flexible than today's OpenGL and Direct3D
pipelines. As such, graphics developers need a basic understanding of
how to combine emerging parallel programming techniques with the
traditional interactive rendering pipeline. This course gives an
introduction to several parallel graphics architectures and programming
environments, and introduces the new types of graphics algorithms that
will be possible. The case studies in the class discuss the mix of
parallel programming constructs used, details of the graphics
algorithms, and how the rendering pipeline and computation interact to
achieve the technical goals. The course speakers are Jason Yang and
Justin Hensley (AMD), Tim Foley (Intel), Mark Harris (NVIDIA), Kun Zhou
(Zhejiang University), Anjul Patney (UC Davis), Pedro Sander (HKUIST),
and Christopher Oat (AMD) (Complete
course notes.)
Posted: 23 Dec 2008 [GPGPU /Miscellaneous/Courses] # NVIDIA Releases Version 2.1 Beta of the CUDA Toolkit and SDK DECEMBER 19, 2008- NVIDIA has announced the availability of version 2.1 beta of its CUDA toolkit and SDK. This is the latest version of the C-compiler and software development tools for accessing the massively parallel CUDA compute architecture of NVIDIA GPUs. In response to overwhelming demand from the developer community, this latest version of the CUDA software suite includes support for NVIDIA®® Tesla™ GPUs on Windows Vista and 32-bit debugger support for CUDA on RedHat Enterprise Linux 5.x (separate download). The CUDA Toolkit and SDK 2.1 beta includes support for VisualStudio 2008 support on Windows XP and Vista and Just-In-Time (JIT) compilation for applications that dynamically generate CUDA kernels. Several new interoperability APIs have been added for Direct3D 9 and Direct3D 10 that accelerate communication to DirectX applications as well as a series of improvements to OpenGL interoperability. CUDA Toolkit and SDK 2.1 beta also features support for using a GPU that is not driving a display on Vista, a beta of Linux Profiler 1.1 (separate download) as well as support for recent releases of Linux including Fedora9, OpenSUSE 11 and Ubuntu 8.04. CUDA Toolkit and SDK 2.1 beta is available today for free download from www.nvidia.com/object/cuda_get. Posted: 23 Dec 2008 [GPGPU /High-Level Languages] # Wait-free programming for general purpose computations on graphics processors Abstract:
This paper aims at bridging the gap between the lack of synchronization mechanisms in recent graphics processor (GPU) architectures and the need of synchronization mechanisms in parallel applications. Based on the intrinsic features of recent GPU architectures, the authors construct strong synchronization objects like wait-free and t-resilient read-modify-write objects for a general model of recent GPU architectures without strong hardware synchronization primitives like test-and-set and compare-and-swap. Accesses to the new wait-free objects have time complexity O(N), where N is the number of concurrent processes. The wait-free objects have space complexity O(N2), which is optimal. Our result demonstrates that it is possible to construct wait-free synchronization mechanisms for GPUs without the need of strong synchronization primitives in hardware and that wait-free programming is possible for GPUs. (Wait-free programming for general purpose computations on graphics processors. Phuong Hoai Ha, Philippas Tsigas, and Otto J. Anshus. ACM Symposium on Principles of Distributed Computing, 2008.) |
