Re: [Mesa-dev] [PATCH 1/2] R600: handle loops to self in the structurizer v2

2013-01-22 Thread Aaron Watry
 Am 22.01.2013 13:08, schrieb Michel Dänzer: 
 On Mon, 2013-01-21 at 22:28 +0100, Christian König wrote: 
 v2: don't mess up other loops 
 
 Signed-off-by: Christian König deathsimple at vodafone.de  Series is 
  Tested-by: Michel Dänzer michel.daenzer at amd.com   No piglit 
  regressions with radeonsi. :)
  
 P.S. A couple of piglit tests still crash in control flow related code 
 though, e.g. glean/glsl1-do-loop or shaders/glsl-fs-discard-04. Have you 
 looked at these before, or would you mind taking a quick look? 
 Oh, thx. I already wanted to ask what's next for radeonsi. I 
haven't realized that those still doesn't work, the last time I 
checked glsl1-do-loop was missing something else beside 
flow control. Going to dig into it, Christian. 
Not sure if its related, but there's a clover for-loop test that triggers GPU 
resets on my HD6850...  Looks like an infinite loop.  Fixing the glsl1-do-loop 
might also fix the clover test.
--Aaron Watry___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/3] R600: Add support for SET*_DX10 instructions

2013-01-31 Thread Aaron Watry
From: Tom Stellard thomas.stellard at amd.com

These instructions compare two floating point values and return an
integer true (-1) or false (0) value.

When compiling code generated by the Mesa GLSL frontend, the SET*_DX10
instructions save us four instructions for most branch decisions that
use floating-point comparisons.
---
 lib/Target/R600/R600ISelLowering.cpp |  108 +++---
 lib/Target/R600/R600Instructions.td  |   52 +
 test/CodeGen/R600/fcmp.ll|4 +-
 test/CodeGen/R600/set-dx10.ll|  137
++
 test/CodeGen/R600/unsuported-cc.ll   |   24 +++---
 5 files changed, 281 insertions(+), 44 deletions(-)
 create mode 100644 test/CodeGen/R600/set-dx10.ll

diff --git a/lib/Target/R600/R600ISelLowering.cpp
b/lib/Target/R600/R600ISelLowering.cpp
index abfee16..c4aa172 100644
--- a/lib/Target/R600/R600ISelLowering.cpp
+++ b/lib/Target/R600/R600ISelLowering.cpp
@@ -90,7 +90,9 @@ R600TargetLowering::R600TargetLowering(TargetMachine TM)
:
   setOperationAction(ISD::FrameIndex, MVT::i32, Custom);

   setTargetDAGCombine(ISD::FP_ROUND);
+  setTargetDAGCombine(ISD::FP_TO_SINT);
   setTargetDAGCombine(ISD::EXTRACT_VECTOR_ELT);
+  setTargetDAGCombine(ISD::SELECT_CC);

   setSchedulingPreference(Sched::VLIW);
 }
@@ -663,9 +665,12 @@ SDValue R600TargetLowering::LowerSELECT_CC(SDValue Op,
SelectionDAG DAG) const
   }

   // Try to lower to a SET* instruction:
-  // We need all the operands of SELECT_CC to have the same value type, so
if
-  // necessary we need to change True and False to be the same type as LHS
and
-  // RHS, and then convert the result of the select_cc back to the correct
type.
+  //
+  // CompareVT == MVT::f32 and VT == MVT::i32 is supported by the hardware,
+  // but for the other case where CompareVT != VT, all operands of
+  // SELECT_CC to have the same value type, so we need to change True and
False

all operands of SELECT_CC to have.  Maybe need to have?

+  // to be the same type as LHS and RHS, and then convert the result of the
+  // select_cc back to the correct type.

   // Move hardware True/False values to the correct operand.
   if (isHWTrueValue(False)  isHWFalseValue(True)) {
@@ -675,32 +680,17 @@ SDValue R600TargetLowering::LowerSELECT_CC(SDValue
Op, SelectionDAG DAG) const
   }

   if (isHWTrueValue(True)  isHWFalseValue(False)) {
-if (CompareVT !=  VT) {
-  if (VT == MVT::f32  CompareVT == MVT::i32) {
-SDValue Boolean = DAG.getNode(ISD::SELECT_CC, DL, CompareVT,
-LHS, RHS,
-DAG.getConstant(-1, MVT::i32),
-DAG.getConstant(0, MVT::i32),
-CC);
-// Convert integer values of true (-1) and false (0) to fp values
of
-// true (1.0f) and false (0.0f).
-SDValue LSB = DAG.getNode(ISD::AND, DL, MVT::i32, Boolean,
-  DAG.getConstant(1,
MVT::i32));
-return DAG.getNode(ISD::UINT_TO_FP, DL, VT, LSB);
-  } else if (VT == MVT::i32  CompareVT == MVT::f32) {
-SDValue BoolAsFlt = DAG.getNode(ISD::SELECT_CC, DL, CompareVT,
-LHS, RHS,
-DAG.getConstantFP(1.0f, MVT::f32),
-DAG.getConstantFP(0.0f, MVT::f32),
-CC);
-// Convert fp values of true (1.0f) and false (0.0f) to integer
values
-// of true (-1) and false (0).
-SDValue Neg = DAG.getNode(ISD::FNEG, DL, MVT::f32, BoolAsFlt);
-return DAG.getNode(ISD::FP_TO_SINT, DL, VT, Neg);
-  } else {
-// I don't think there will be any other type pairings.
-assert(!Unhandled operand type parings in SELECT_CC);
-  }
+if (CompareVT !=  VT  VT == MVT::f32  CompareVT == MVT::i32) {
+  SDValue Boolean = DAG.getNode(ISD::SELECT_CC, DL, CompareVT,
+  LHS, RHS,
+  DAG.getConstant(-1, MVT::i32),
+  DAG.getConstant(0, MVT::i32),
+  CC);
+  // Convert integer values of true (-1) and false (0) to fp values of
+  // true (1.0f) and false (0.0f).
+  SDValue LSB = DAG.getNode(ISD::AND, DL, MVT::i32, Boolean,
+DAG.getConstant(1,
MVT::i32));
+  return DAG.getNode(ISD::UINT_TO_FP, DL, VT, LSB);
 } else {
   // This SELECT_CC is already legal.
   return DAG.getNode(ISD::SELECT_CC, DL, VT, LHS, RHS, True, False,
CC);
@@ -1121,6 +,35 @@ SDValue R600TargetLowering::PerformDAGCombine(SDNode
*N,
   }
   break;
 }
+
+  // (i32 fp_to_sint (fneg (select_cc f32, f32, 1.0, 0.0 cc))) -
+  // (i32 select_cc f32, f32, -1, 0 cc)
+  //
+  // Mesa's GLSL frontend generates the above pattern a lot and we can
lower
+  // this to one of the SET*_DX10 instructions.
+  case ISD::FP_TO_SINT: {
+SDValue FNeg = N-getOperand(0);
+if (FNeg.getOpcode() != ISD::FNEG) {
+  return SDValue();
+}
+SDValue SelectCC = FNeg.getOperand(0);
+if (SelectCC.getOpcode() != ISD::SELECT_CC ||
+SelectCC.getOperand(0).getValueType() != 

Re: [Mesa-dev] [PATCH (9.1)] Revert r600g: re-enable handling of DISCARD_RANGE, improving performance

2013-02-20 Thread Aaron Watry
I've managed to capture a trace that loads TF2 to the menu and reproduces
some of the flickering.  I haven't managed to capture any gameplay yet due
to an error in CD Key authentication due to how I'm launching the game.

URL:
http://www.watrys.net/tf2_menu.trace.xz

--Aaron


On Wed, Feb 20, 2013 at 1:33 PM, Marek Olšák mar...@gmail.com wrote:

 Well, I really wonder why it doesn't work with TF2. We have a lot of
 piglit tests for DISCARD_RANGE (AKA INVALIDATE_RANGE in GL) and the
 copy-buffer functionality and they all pass. There are actually 3
 different implementations of copy-buffer: streamout, CP DMA, and async
 DMA. It's highly unlikely that all 3 would be broken in the same way.
 Other apps like openra also draw their GUI while using DISCARD_RANGE
 to discard old data and there is no issue.

 I need an apitrace to debug this. I wouldn't like to disable
 DISCARD_RANGE just because of TF2, because everything else seems to be
 working without any issues and because the performance regression can
 turn out to be a show-stopper for some other apps.

 Marek

 On Sun, Feb 17, 2013 at 6:25 PM, Andreas Boll
 andreas.boll@gmail.com wrote:
  This reverts commit 1eedebc65b02130ef7a27062a1ed67972a317a08.
 
  Until we have a proper fix disable DISCARD_RANGE for now.
  It fixes the garbled ui in TF2.
 
  Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=58042
 
  Cc: Marek Olšák mar...@gmail.com
  ---
   src/gallium/drivers/r600/r600_buffer.c |2 ++
   1 file changed, 2 insertions(+)
 
  diff --git a/src/gallium/drivers/r600/r600_buffer.c
 b/src/gallium/drivers/r600/r600_buffer.c
  index 6df0d91..82630af 100644
  --- a/src/gallium/drivers/r600/r600_buffer.c
  +++ b/src/gallium/drivers/r600/r600_buffer.c
  @@ -137,6 +137,7 @@ static void *r600_buffer_transfer_map(struct
 pipe_context *ctx,
  r600_set_constants_dirty_if_bound(rctx, rbuffer);
  }
  }
  +#if 0 /* this is broken (see Bug 53130) */
  else if ((usage  PIPE_TRANSFER_DISCARD_RANGE) 
   !(usage  PIPE_TRANSFER_UNSYNCHRONIZED) 
   rctx-screen-has_streamout 
  @@ -161,6 +162,7 @@ static void *r600_buffer_transfer_map(struct
 pipe_context *ctx,
  }
  }
  }
  +#endif
 
  /* mmap and synchronize with rings */
  data = r600_buffer_mmap_sync_with_rings(rctx, rbuffer, usage);
  --
  1.7.10.4
 
 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] clover: Fix build with LLVM 3.3

2013-02-21 Thread Aaron Watry
Hi Tom,

Mesa+Clover does indeed build against master llvm/clang, but I'm having
trouble building against it when I try to do a clean build of Piglit.

Error received:

[ 18%] Built target piglitutil_cl
Linking C executable ../../../../../bin/cl-custom-run-simple-kernel
/usr/local/lib/libOpenCL.so: undefined reference to
`clang::PPConditionalDirectiveRecord::rangeIntersectsConditionalDirective(clang::SourceRange)
const'
/usr/local/lib/libOpenCL.so: undefined reference to
`clang::PPConditionalDirectiveRecord::findConditionalDirectiveRegionLoc(clang::SourceLocation)
const'
collect2: error: ld returned 1 exit status
make[2]: *** [bin/cl-custom-run-simple-kernel] Error 1
make[1]: ***
[target_api/cl/tests/cl/custom/CMakeFiles/cl-custom-run-simple-kernel.dir/all]
Error 2
make: *** [all] Error 2

Maybe I've done something wrong, but I've tested this on two machines now
and both times I've wiped my llvm/clang/mesa/clover installs in /usr/local
and rebuilt from scratch.

--Aaron

On Wed, Feb 20, 2013 at 4:27 PM, Tom Stellard t...@stellard.net wrote:

 From: Tom Stellard thomas.stell...@amd.com

 ---
  .../state_trackers/clover/llvm/invocation.cpp  |   47
 ---
  1 files changed, 39 insertions(+), 8 deletions(-)

 diff --git a/src/gallium/state_trackers/clover/llvm/invocation.cpp
 b/src/gallium/state_trackers/clover/llvm/invocation.cpp
 index 0bd8e22..2785d10 100644
 --- a/src/gallium/state_trackers/clover/llvm/invocation.cpp
 +++ b/src/gallium/state_trackers/clover/llvm/invocation.cpp
 @@ -28,10 +28,17 @@
  #include clang/CodeGen/CodeGenAction.h
  #include llvm/Bitcode/BitstreamWriter.h
  #include llvm/Bitcode/ReaderWriter.h
 -#include llvm/DerivedTypes.h
  #include llvm/Linker.h
 +#if HAVE_LLVM  0x0303
 +#include llvm/DerivedTypes.h
  #include llvm/LLVMContext.h
  #include llvm/Module.h
 +#else
 +#include llvm/IR/DerivedTypes.h
 +#include llvm/IR/LLVMContext.h
 +#include llvm/IR/Module.h
 +#include llvm/Support/IRReader.h
 +#endif
  #include llvm/PassManager.h
  #include llvm/Support/TargetSelect.h
  #include llvm/Support/MemoryBuffer.h
 @@ -41,8 +48,10 @@

  #if HAVE_LLVM  0x0302
  #include llvm/Target/TargetData.h
 -#else
 +#elif HAVE_LLVM  0x0303
  #include llvm/DataLayout.h
 +#else
 +#include llvm/IR/DataLayout.h
  #endif

  #include pipe/p_state.h
 @@ -151,7 +160,11 @@ namespace {
// Add libclc generic search path
c.getHeaderSearchOpts().AddPath(LIBCLC_INCLUDEDIR,
clang::frontend::Angled,
 -  false, false, false);
 +  false, false
 +#if HAVE_LLVM  0x0303
 +  , false
 +#endif
 +  );

// Add libclc include
c.getPreprocessorOpts().Includes.push_back(clc/clc.h);
 @@ -167,8 +180,12 @@ namespace {
c.getInvocation().setLangDefaults(c.getLangOpts(), clang::IK_OpenCL,

  clang::LangStandard::lang_opencl11);
  #endif
 -  c.createDiagnostics(0, NULL, new clang::TextDiagnosticPrinter(
 -  s_log,
 +  c.createDiagnostics(
 +#if HAVE_LLVM  0x0303
 +  0, NULL,
 +#endif
 +  new clang::TextDiagnosticPrinter(
 + s_log,
  #if HAVE_LLVM = 0x0301
   c.getDiagnosticOpts()));
  #else
 @@ -201,12 +218,26 @@ namespace {

llvm::PassManager PM;
llvm::PassManagerBuilder Builder;
 -  bool isNative;
 -  llvm::Linker linker(clover, mod);
 +  llvm::sys::Path libclc_path =
 +llvm::sys::Path(LIBCLC_LIBEXECDIR + triple +
 .bc);

// Link the kernel with libclc
 -  linker.LinkInFile(llvm::sys::Path(LIBCLC_LIBEXECDIR + triple +
 .bc), isNative);
 +#if HAVE_LLVM  0x0303
 +  bool isNative;
 +  llvm::Linker linker(clover, mod);
 +  linker.LinkInFile(libclc_path, isNative);
mod = linker.releaseModule();
 +#else
 +  std::string err_str;
 +  llvm::SMDiagnostic err;
 +  llvm::Module *libclc_mod = llvm::ParseIRFile(libclc_path.str(), err,
 +   mod-getContext());
 +  if (llvm::Linker::LinkModules(mod, libclc_mod,
 +llvm::Linker::DestroySource,
 +err_str)) {
 + throw build_error(err_str);
 +  }
 +#endif

// Add a function internalizer pass.
//
 --
 1.7.8.6

 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] clover: Fix build with LLVM 3.3

2013-02-21 Thread Aaron Watry
On Thu, Feb 21, 2013 at 8:33 AM, Tom Stellard t...@stellard.net wrote:

 On Thu, Feb 21, 2013 at 08:25:20AM -0600, Aaron Watry wrote:
  Hi Tom,
 
  Mesa+Clover does indeed build against master llvm/clang, but I'm having
  trouble building against it when I try to do a clean build of Piglit.
 
  Error received:
 
  [ 18%] Built target piglitutil_cl
  Linking C executable ../../../../../bin/cl-custom-run-simple-kernel
  /usr/local/lib/libOpenCL.so: undefined reference to
 
 `clang::PPConditionalDirectiveRecord::rangeIntersectsConditionalDirective(clang::SourceRange)
  const'
  /usr/local/lib/libOpenCL.so: undefined reference to
 
 `clang::PPConditionalDirectiveRecord::findConditionalDirectiveRegionLoc(clang::SourceLocation)
  const'
  collect2: error: ld returned 1 exit status
  make[2]: *** [bin/cl-custom-run-simple-kernel] Error 1
  make[1]: ***
 
 [target_api/cl/tests/cl/custom/CMakeFiles/cl-custom-run-simple-kernel.dir/all]
  Error 2
  make: *** [all] Error 2
 
  Maybe I've done something wrong, but I've tested this on two machines now
  and both times I've wiped my llvm/clang/mesa/clover installs in
 /usr/local
  and rebuilt from scratch.
 

 Which revisions of Clang and LLVM are you using?


I'm not at home at the moment, so I don't have access to those machines,
but from memory:

LLVM was git master as of sometime around 6-8pm CST last night.  One such
revision would be
commit ffbe432595c78ba28c8a9d200bf92996eed5e5d9
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@175718 91177308-0d3


Clang was somewhere between 3bc7b6bef96 and dc84cd5efdd3430efb.

I just retested on another machine with the following versions:
LLVM: git: 0514595b9b20c9d80, svn: 175739
Clang: 7d81281fc39f6d, svn: 175741
Mesa: b63b3012c91 with your clover patch

Result:
Linking C executable ../../../../../bin/cl-custom-run-simple-kernel
/usr/local/lib/libOpenCL.so: undefined reference to
`clang::PPConditionalDirectiveRecord::rangeIntersectsConditionalDirective(clang::SourceRange)
const'
/usr/local/lib/libOpenCL.so: undefined reference to
`clang::PPConditionalDirectiveRecord::findConditionalDirectiveRegionLoc(clang::SourceLocation)
const'
collect2: error: ld returned 1 exit status
make[2]: *** [bin/cl-custom-run-simple-kernel] Error 1
make[1]: ***
[target_api/cl/tests/cl/custom/CMakeFiles/cl-custom-run-simple-kernel.dir/all]
Error 2
make: *** [all] Error 2

--Aaron


 -Tom

  --Aaron
 
  On Wed, Feb 20, 2013 at 4:27 PM, Tom Stellard t...@stellard.net wrote:
 
   From: Tom Stellard thomas.stell...@amd.com
  
   ---
.../state_trackers/clover/llvm/invocation.cpp  |   47
   ---
1 files changed, 39 insertions(+), 8 deletions(-)
  
   diff --git a/src/gallium/state_trackers/clover/llvm/invocation.cpp
   b/src/gallium/state_trackers/clover/llvm/invocation.cpp
   index 0bd8e22..2785d10 100644
   --- a/src/gallium/state_trackers/clover/llvm/invocation.cpp
   +++ b/src/gallium/state_trackers/clover/llvm/invocation.cpp
   @@ -28,10 +28,17 @@
#include clang/CodeGen/CodeGenAction.h
#include llvm/Bitcode/BitstreamWriter.h
#include llvm/Bitcode/ReaderWriter.h
   -#include llvm/DerivedTypes.h
#include llvm/Linker.h
   +#if HAVE_LLVM  0x0303
   +#include llvm/DerivedTypes.h
#include llvm/LLVMContext.h
#include llvm/Module.h
   +#else
   +#include llvm/IR/DerivedTypes.h
   +#include llvm/IR/LLVMContext.h
   +#include llvm/IR/Module.h
   +#include llvm/Support/IRReader.h
   +#endif
#include llvm/PassManager.h
#include llvm/Support/TargetSelect.h
#include llvm/Support/MemoryBuffer.h
   @@ -41,8 +48,10 @@
  
#if HAVE_LLVM  0x0302
#include llvm/Target/TargetData.h
   -#else
   +#elif HAVE_LLVM  0x0303
#include llvm/DataLayout.h
   +#else
   +#include llvm/IR/DataLayout.h
#endif
  
#include pipe/p_state.h
   @@ -151,7 +160,11 @@ namespace {
  // Add libclc generic search path
  c.getHeaderSearchOpts().AddPath(LIBCLC_INCLUDEDIR,
  clang::frontend::Angled,
   -  false, false, false);
   +  false, false
   +#if HAVE_LLVM  0x0303
   +  , false
   +#endif
   +  );
  
  // Add libclc include
  c.getPreprocessorOpts().Includes.push_back(clc/clc.h);
   @@ -167,8 +180,12 @@ namespace {
  c.getInvocation().setLangDefaults(c.getLangOpts(),
 clang::IK_OpenCL,
  
clang::LangStandard::lang_opencl11);
#endif
   -  c.createDiagnostics(0, NULL, new clang::TextDiagnosticPrinter(
   -  s_log,
   +  c.createDiagnostics(
   +#if HAVE_LLVM  0x0303
   +  0, NULL,
   +#endif
   +  new clang::TextDiagnosticPrinter(
   + s_log,
#if HAVE_LLVM = 0x0301
 c.getDiagnosticOpts()));
#else
   @@ -201,12 +218,26

Re: [Mesa-dev] [PATCH] clover: Fix build with LLVM 3.3

2013-02-21 Thread Aaron Watry
On Thu, Feb 21, 2013 at 10:06 AM, Tom Stellard t...@stellard.net wrote:

 On Thu, Feb 21, 2013 at 10:02:34AM -0600, Aaron Watry wrote:
  On Thu, Feb 21, 2013 at 8:33 AM, Tom Stellard t...@stellard.net wrote:
 
   On Thu, Feb 21, 2013 at 08:25:20AM -0600, Aaron Watry wrote:
Hi Tom,
   
Mesa+Clover does indeed build against master llvm/clang, but I'm
 having
trouble building against it when I try to do a clean build of Piglit.
   
Error received:
   
[ 18%] Built target piglitutil_cl
Linking C executable ../../../../../bin/cl-custom-run-simple-kernel
/usr/local/lib/libOpenCL.so: undefined reference to
   
  
 `clang::PPConditionalDirectiveRecord::rangeIntersectsConditionalDirective(clang::SourceRange)
const'
/usr/local/lib/libOpenCL.so: undefined reference to
   
  
 `clang::PPConditionalDirectiveRecord::findConditionalDirectiveRegionLoc(clang::SourceLocation)
const'
collect2: error: ld returned 1 exit status
make[2]: *** [bin/cl-custom-run-simple-kernel] Error 1
make[1]: ***
   
  
 [target_api/cl/tests/cl/custom/CMakeFiles/cl-custom-run-simple-kernel.dir/all]
Error 2
make: *** [all] Error 2
   
Maybe I've done something wrong, but I've tested this on two
 machines now
and both times I've wiped my llvm/clang/mesa/clover installs in
   /usr/local
and rebuilt from scratch.
   
  
   Which revisions of Clang and LLVM are you using?
  
  
  I'm not at home at the moment, so I don't have access to those machines,
  but from memory:
 
  LLVM was git master as of sometime around 6-8pm CST last night.  One such
  revision would be
  commit ffbe432595c78ba28c8a9d200bf92996eed5e5d9
  git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@17571891177308-0d3
 
 
  Clang was somewhere between 3bc7b6bef96 and dc84cd5efdd3430efb.
 
  I just retested on another machine with the following versions:
  LLVM: git: 0514595b9b20c9d80, svn: 175739
  Clang: 7d81281fc39f6d, svn: 175741
  Mesa: b63b3012c91 with your clover patch
 
  Result:
  Linking C executable ../../../../../bin/cl-custom-run-simple-kernel
  /usr/local/lib/libOpenCL.so: undefined reference to
 
 `clang::PPConditionalDirectiveRecord::rangeIntersectsConditionalDirective(clang::SourceRange)
  const'
  /usr/local/lib/libOpenCL.so: undefined reference to
 
 `clang::PPConditionalDirectiveRecord::findConditionalDirectiveRegionLoc(clang::SourceLocation)
  const'
  collect2: error: ld returned 1 exit status
  make[2]: *** [bin/cl-custom-run-simple-kernel] Error 1
  make[1]: ***
 
 [target_api/cl/tests/cl/custom/CMakeFiles/cl-custom-run-simple-kernel.dir/all]
  Error 2
  make: *** [all] Error 2
 

 Did you re-configure and run make clean for piglit?


I did a full git clean -fdx on LLVM, Clang, libclc, Mesa, and Piglit after
removing all existing LLVM/Clang/Mesa includes/libraries from /usr/local.
I also did a full search for other copies of libclang on my system and the
only copies were in /usr/local/lib.

Note: libclc's prepare-builtins.cpp needed to be updated for LLVM 3.3 as
well (moving 4 headers into llvm/IR/*.h.

I then did a clean configure/make/make install on LLVM/libclc/Mesa, and
then a clean rebuild of piglit.

LLVM Configure: CC=gcc CXX=g++ ./configure --enable-optimized
--enable-assertions=no --enable-experimental-
targets=R600 --enable-targets=x86 --enable-shared --prefix=/usr/local

Mesa Configure: ./configure --with-dri-drivers=radeon
--with-gallium-drivers=r600 --enable-texture-float --enable-opencl

Piglit was rebuilt clean with only the CL tests enabled. No GL, GLX ,
Waffle, etc.

libOpenCL is linked against the 3.3 build of LLVM, but I did notice that
there's no linking to clang.. I did notice that while LLVM builds a shared
library, clang only installs a static archive. Are we statically linking
Clang into Mesa while using the shared libLLVM?

ldd /usr/local/lib/libOpenCL.so:
linux-vdso.so.1 =  (0x7dce1000)
libpthread.so.0 = /lib/x86_64-linux-gnu/libpthread.so.0
(0x7effa3e21000)
libxcb-dri2.so.0 = /usr/lib/x86_64-linux-gnu/libxcb-dri2.so.0
(0x7effa3c1c000)
libxcb.so.1 = /usr/lib/x86_64-linux-gnu/libxcb.so.1
(0x7effa39fd000)
libdrm.so.2 = /usr/local/lib/libdrm.so.2 (0x7effa37f1000)
libudev.so.0 = /lib/x86_64-linux-gnu/libudev.so.0 (0x7effa35e4000)
librt.so.1 = /lib/x86_64-linux-gnu/librt.so.1 (0x7effa33db000)
libdl.so.2 = /lib/x86_64-linux-gnu/libdl.so.2 (0x7effa31d7000)
libLLVM-3.3svn.so = /usr/local/lib/libLLVM-3.3svn.so(0x7effa22d8000)
libstdc++.so.6 = /usr/lib/x86_64-linux-gnu/libstdc++.so.6
(0x7effa1fd4000)
libm.so.6 = /lib/x86_64-linux-gnu/libm.so.6 (0x7effa1cd8000)
libc.so.6 = /lib/x86_64-linux-gnu/libc.so.6 (0x7effa1919000)
libgcc_s.so.1 = /lib/x86_64-linux-gnu/libgcc_s.so.1
(0x7effa1702000)
/lib64/ld-linux-x86-64.so.2 (0x7effa511b000)
libXau.so.6 = /usr/lib/x86_64-linux-gnu/libXau.so.6
(0x7effa14fe000)
libXdmcp.so.6 = /usr/lib/x86_64-linux-gnu

Re: [Mesa-dev] [PATCH] clover: Fix build with LLVM 3.3

2013-02-22 Thread Aaron Watry
On Fri, Feb 22, 2013 at 12:21 PM, Tom Stellard t...@stellard.net wrote:

 On Thu, Feb 21, 2013 at 08:25:20AM -0600, Aaron Watry wrote:
  Hi Tom,
 
  Mesa+Clover does indeed build against master llvm/clang, but I'm having
  trouble building against it when I try to do a clean build of Piglit.
 
  Error received:
 
  [ 18%] Built target piglitutil_cl
  Linking C executable ../../../../../bin/cl-custom-run-simple-kernel
  /usr/local/lib/libOpenCL.so: undefined reference to
 
 `clang::PPConditionalDirectiveRecord::rangeIntersectsConditionalDirective(clang::SourceRange)
  const'
  /usr/local/lib/libOpenCL.so: undefined reference to
 
 `clang::PPConditionalDirectiveRecord::findConditionalDirectiveRegionLoc(clang::SourceLocation)
  const'
  collect2: error: ld returned 1 exit status
  make[2]: *** [bin/cl-custom-run-simple-kernel] Error 1
  make[1]: ***
 
 [target_api/cl/tests/cl/custom/CMakeFiles/cl-custom-run-simple-kernel.dir/all]
  Error 2
  make: *** [all] Error 2
 
  Maybe I've done something wrong, but I've tested this on two machines now
  and both times I've wiped my llvm/clang/mesa/clover installs in
 /usr/local
  and rebuilt from scratch.
 

 I've sent a v2 of this patch that should fix this.  The dependencies
 between
 clang libraries changed so I had to change the order that they were passed
 to
 the linker.


That seems to have done the trick.  Piglit now builds correctly on the
machine that I have available here to test with (one that was failing
yesterday with v1).  I can do a full piglit GL/CL test run once I get home
if needed.

--Aaron


 -Tom

  --Aaron
 
  On Wed, Feb 20, 2013 at 4:27 PM, Tom Stellard t...@stellard.net wrote:
 
   From: Tom Stellard thomas.stell...@amd.com
  
   ---
.../state_trackers/clover/llvm/invocation.cpp  |   47
   ---
1 files changed, 39 insertions(+), 8 deletions(-)
  
   diff --git a/src/gallium/state_trackers/clover/llvm/invocation.cpp
   b/src/gallium/state_trackers/clover/llvm/invocation.cpp
   index 0bd8e22..2785d10 100644
   --- a/src/gallium/state_trackers/clover/llvm/invocation.cpp
   +++ b/src/gallium/state_trackers/clover/llvm/invocation.cpp
   @@ -28,10 +28,17 @@
#include clang/CodeGen/CodeGenAction.h
#include llvm/Bitcode/BitstreamWriter.h
#include llvm/Bitcode/ReaderWriter.h
   -#include llvm/DerivedTypes.h
#include llvm/Linker.h
   +#if HAVE_LLVM  0x0303
   +#include llvm/DerivedTypes.h
#include llvm/LLVMContext.h
#include llvm/Module.h
   +#else
   +#include llvm/IR/DerivedTypes.h
   +#include llvm/IR/LLVMContext.h
   +#include llvm/IR/Module.h
   +#include llvm/Support/IRReader.h
   +#endif
#include llvm/PassManager.h
#include llvm/Support/TargetSelect.h
#include llvm/Support/MemoryBuffer.h
   @@ -41,8 +48,10 @@
  
#if HAVE_LLVM  0x0302
#include llvm/Target/TargetData.h
   -#else
   +#elif HAVE_LLVM  0x0303
#include llvm/DataLayout.h
   +#else
   +#include llvm/IR/DataLayout.h
#endif
  
#include pipe/p_state.h
   @@ -151,7 +160,11 @@ namespace {
  // Add libclc generic search path
  c.getHeaderSearchOpts().AddPath(LIBCLC_INCLUDEDIR,
  clang::frontend::Angled,
   -  false, false, false);
   +  false, false
   +#if HAVE_LLVM  0x0303
   +  , false
   +#endif
   +  );
  
  // Add libclc include
  c.getPreprocessorOpts().Includes.push_back(clc/clc.h);
   @@ -167,8 +180,12 @@ namespace {
  c.getInvocation().setLangDefaults(c.getLangOpts(),
 clang::IK_OpenCL,
  
clang::LangStandard::lang_opencl11);
#endif
   -  c.createDiagnostics(0, NULL, new clang::TextDiagnosticPrinter(
   -  s_log,
   +  c.createDiagnostics(
   +#if HAVE_LLVM  0x0303
   +  0, NULL,
   +#endif
   +  new clang::TextDiagnosticPrinter(
   + s_log,
#if HAVE_LLVM = 0x0301
 c.getDiagnosticOpts()));
#else
   @@ -201,12 +218,26 @@ namespace {
  
  llvm::PassManager PM;
  llvm::PassManagerBuilder Builder;
   -  bool isNative;
   -  llvm::Linker linker(clover, mod);
   +  llvm::sys::Path libclc_path =
   +llvm::sys::Path(LIBCLC_LIBEXECDIR +
 triple +
   .bc);
  
  // Link the kernel with libclc
   -  linker.LinkInFile(llvm::sys::Path(LIBCLC_LIBEXECDIR + triple +
   .bc), isNative);
   +#if HAVE_LLVM  0x0303
   +  bool isNative;
   +  llvm::Linker linker(clover, mod);
   +  linker.LinkInFile(libclc_path, isNative);
  mod = linker.releaseModule();
   +#else
   +  std::string err_str;
   +  llvm::SMDiagnostic err;
   +  llvm::Module *libclc_mod = llvm::ParseIRFile(libclc_path.str(),
 err

Re: [Mesa-dev] [PATCH 0/9] remove mfeatures.h file

2013-02-26 Thread Aaron Watry
Same error here.

Configuration: ./autogen.sh --enable-texture-float --enable-opencl
--with-gallium-drivers=r600 --with-dri-drivers=radeon --prefix=/usr/local

--Aaron


On Tue, Feb 26, 2013 at 11:09 AM, Jordan Justen jljus...@gmail.com wrote:

 On Sat, Feb 23, 2013 at 7:29 AM, Brian Paul bri...@vmware.com wrote:
  This series removes the dependencies on the mfeatures.h file and the file
  itself.
 
  I'd appreciated someone doing a test build of this series to
 double-check my
  work.

 I'm getting a build error:
   GENmain/get_hash.h
 updating main/git_sha1.h
 get_hash_generator.py: need at least a single enabled API

 when testing your branch at
 git://people.freedesktop.org/~brianp/mesa.git remove-mfeatures

 Here is the config I was using:
 ./autogen.sh \
   --enable-gles2 --enable-gles1 \
   --enable-egl \
   --with-dri-drivers=i965,swrast \
   --enable-debug \
   --enable-shared-glapi --enable-glx-tls --enable-texture-float \
   --with-egl-drivers=dri2,glx --with-egl-platforms=x11,drm,wayland \
   --enable-gbm \
   --disable-glu --with-gallium-drivers='' \
   --disable-gallium-egl

 -Jordan
 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] libclc: Fix libclc build for LLVM 3.3

2013-03-08 Thread Aaron Watry
LLVM moved a bunch of IR-related headers for version 3.3.

This fixes the libclc build to follow suit.

---
 utils/prepare-builtins.cpp |   12 
 1 file changed, 12 insertions(+)

diff --git a/utils/prepare-builtins.cpp b/utils/prepare-builtins.cpp
index ae7731b..0141484 100644
--- a/utils/prepare-builtins.cpp
+++ b/utils/prepare-builtins.cpp
@@ -1,9 +1,21 @@
 #include llvm/ADT/OwningPtr.h
 #include llvm/Bitcode/ReaderWriter.h
+
+#ifndef HAVE_LLVM
+#include llvm/Config/config.h
+#define HAVE_LLVM ((LLVM_VERSION_MAJOR  8)|LLVM_VERSION_MINOR)
+#endif
+#if HAVE_LLVM  0x0303
 #include llvm/Function.h
 #include llvm/GlobalVariable.h
 #include llvm/LLVMContext.h
 #include llvm/Module.h
+#else
+#include llvm/IR/Function.h
+#include llvm/IR/GlobalVariable.h
+#include llvm/IR/LLVMContext.h
+#include llvm/IR/Module.h
+#endif
 #include llvm/Support/CommandLine.h
 #include llvm/Support/ManagedStatic.h
 #include llvm/Support/MemoryBuffer.h
--
1.7.10.4
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] [libclc] configure: Enable building separate libraries for target variants

2013-03-13 Thread Aaron Watry
The python changes in this file look good to me. I haven't done a
line-by-line review of the SI changes.

I tested this patch and v2 of the related mesa series on r600g (radeon
6850) with a recent LLVM and fresh mesa master as of this evening. No real
change in the piglit CL test success/failure rate.

Do you have any interest in trying to merge your changes to date back into
the upstream libclc codebase?  If you think it's a good idea, but don't
have time to do it yourself, let me know and I'll try to re-base the series
of patches.

--Aaron


On Tue, Mar 12, 2013 at 3:20 PM, Tom Stellard t...@stellard.net wrote:

 From: Tom Stellard thomas.stell...@amd.com

 ---
  configure.py |  119
 -
  1 files changed, 75 insertions(+), 44 deletions(-)

 diff --git a/configure.py b/configure.py
 index d861c24..dfd9a8f 100755
 --- a/configure.py
 +++ b/configure.py
 @@ -68,6 +68,15 @@ llvm_clang = os.path.join(llvm_bindir, 'clang')
  llvm_link = os.path.join(llvm_bindir, 'llvm-link')
  llvm_opt = os.path.join(llvm_bindir, 'opt')

 +available_targets = {
 +  'r600--' : { 'devices' :
 +   [{'gpu' : 'cedar',   'aliases' : ['palm', 'sumo', 'sumo2',
 'redwood', 'juniper']},
 +{'gpu' : 'cypress', 'aliases' : ['hemlock']},
 +{'gpu' : 'barts',   'aliases' : ['turks', 'caicos']},
 +{'gpu' : 'cayman',  'aliases' : ['aruba']},
 +{'gpu' : 'tahiti',  'aliases' : ['pitcairn', 'verde',
 'oland']}]}
 +}
 +
  default_targets = ['r600--']

  targets = args
 @@ -127,50 +136,72 @@ for target in targets:

clang_cl_includes = ' '.join([-I%s % incdir for incdir in incdirs])

 -  # The rule for building a .bc file for the specified architecture using
 clang.
 -  clang_bc_flags = -target %s -I`dirname $in` %s  \
 -   -Dcl_clang_storage_class_specifiers  \
 -   -Dcl_khr_fp64  \
 -   -emit-llvm % (target, clang_cl_includes)
 -  clang_bc_rule = CLANG_CL_BC_ + target
 -  c_compiler_rule(b, clang_bc_rule, LLVM-CC, llvm_clang, clang_bc_flags)
 -
 -  objects = []
 -  sources_seen = set()
 -
 -  for libdir in libdirs:
 -subdir_list_file = os.path.join(libdir, 'SOURCES')
 -manifest_deps.add(subdir_list_file)
 -override_list_file = os.path.join(libdir, 'OVERRIDES')
 -
 -# Add target overrides
 -if os.path.exists(override_list_file):
 -  for override in open(override_list_file).readlines():
 -override = override.rstrip()
 -sources_seen.add(override)
 -
 -for src in open(subdir_list_file).readlines():
 -  src = src.rstrip()
 -  if src not in sources_seen:
 -sources_seen.add(src)
 -obj = os.path.join(target, 'lib', src + '.bc')
 -objects.append(obj)
 -src_file = os.path.join(libdir, src)
 -ext = os.path.splitext(src)[1]
 -if ext == '.ll':
 -  b.build(obj, 'LLVM_AS', src_file)
 -else:
 -  b.build(obj, clang_bc_rule, src_file)
 -
 -  builtins_link_bc = os.path.join(target, 'lib', 'builtins.link.bc')
 -  builtins_opt_bc = os.path.join(target, 'lib', 'builtins.opt.bc')
 -  builtins_bc = os.path.join('built_libs', target + '.bc')
 -  b.build(builtins_link_bc, LLVM_LINK, objects)
 -  b.build(builtins_opt_bc, OPT, builtins_link_bc)
 -  b.build(builtins_bc, PREPARE_BUILTINS, builtins_opt_bc,
 prepare_builtins)
 -  install_files_bc.append((builtins_bc, builtins_bc))
 -  install_deps.append(builtins_bc)
 -  b.default(builtins_bc)
 +  for device in available_targets[target]['devices']:
 +# The rule for building a .bc file for the specified architecture
 using clang.
 +clang_bc_flags = -target %s -I`dirname $in` %s  \
 + -Dcl_clang_storage_class_specifiers  \
 + -Dcl_khr_fp64  \
 + -emit-llvm % (target, clang_cl_includes)
 +if device['gpu'] != '':
 +  clang_bc_flags += ' -mcpu=' + device['gpu']
 +clang_bc_rule = CLANG_CL_BC_ + target
 +c_compiler_rule(b, clang_bc_rule, LLVM-CC, llvm_clang,
 clang_bc_flags)
 +
 +objects = []
 +sources_seen = set()
 +
 +if device['gpu'] == '':
 +  full_target_name = target
 +  obj_suffix = ''
 +else:
 +  full_target_name = device['gpu'] + '-' + target
 +  obj_suffix = '.' + device['gpu']
 +
 +for libdir in libdirs:
 +  subdir_list_file = os.path.join(libdir, 'SOURCES')
 +  manifest_deps.add(subdir_list_file)
 +  override_list_file = os.path.join(libdir, 'OVERRIDES')
 +
 +  # Add target overrides
 +  if os.path.exists(override_list_file):
 +for override in open(override_list_file).readlines():
 +  override = override.rstrip()
 +  sources_seen.add(override)
 +
 +  for src in open(subdir_list_file).readlines():
 +src = src.rstrip()
 +# Only add the base filename (e.g. Add get_global_id instead of
 +# get_global_id.cl) to sources_seen.
 +# 

[Mesa-dev] [PATCH] libclc: Add max() builtin function

2013-03-14 Thread Aaron Watry
Adds this function for both int and floating data types.
---
 generic/include/clc/clc.h   |2 ++
 generic/include/clc/integer/max.h   |2 ++
 generic/include/clc/integer/max.inc |1 +
 generic/include/clc/math/max.h  |2 ++
 generic/include/clc/math/max.inc|1 +
 generic/lib/SOURCES |2 ++
 generic/lib/integer/max.cl  |4 
 generic/lib/integer/max.inc |3 +++
 generic/lib/math/max.cl |8 
 generic/lib/math/max.inc|3 +++
 10 files changed, 28 insertions(+)
 create mode 100644 generic/include/clc/integer/max.h
 create mode 100644 generic/include/clc/integer/max.inc
 create mode 100644 generic/include/clc/math/max.h
 create mode 100644 generic/include/clc/math/max.inc
 create mode 100644 generic/lib/integer/max.cl
 create mode 100644 generic/lib/integer/max.inc
 create mode 100644 generic/lib/math/max.cl
 create mode 100644 generic/lib/math/max.inc

diff --git a/generic/include/clc/clc.h b/generic/include/clc/clc.h
index 4394c9e..f6668a3 100644
--- a/generic/include/clc/clc.h
+++ b/generic/include/clc/clc.h
@@ -45,6 +45,7 @@
 #include clc/math/log.h
 #include clc/math/log2.h
 #include clc/math/mad.h
+#include clc/math/max.h
 #include clc/math/pow.h
 #include clc/math/sin.h
 #include clc/math/sqrt.h
@@ -63,6 +64,7 @@
 #include clc/integer/abs.h
 #include clc/integer/abs_diff.h
 #include clc/integer/add_sat.h
+#include clc/integer/max.h
 #include clc/integer/sub_sat.h
 
 /* 6.11.5 Geometric Functions */
diff --git a/generic/include/clc/integer/max.h 
b/generic/include/clc/integer/max.h
new file mode 100644
index 000..e74a459
--- /dev/null
+++ b/generic/include/clc/integer/max.h
@@ -0,0 +1,2 @@
+#define BODY clc/integer/max.inc
+#include clc/integer/gentype.inc
diff --git a/generic/include/clc/integer/max.inc 
b/generic/include/clc/integer/max.inc
new file mode 100644
index 000..ce6c6d0
--- /dev/null
+++ b/generic/include/clc/integer/max.inc
@@ -0,0 +1 @@
+_CLC_OVERLOAD _CLC_DECL GENTYPE max(GENTYPE a, GENTYPE b);
diff --git a/generic/include/clc/math/max.h b/generic/include/clc/math/max.h
new file mode 100644
index 000..3d158f1
--- /dev/null
+++ b/generic/include/clc/math/max.h
@@ -0,0 +1,2 @@
+#define BODY clc/math/max.inc
+#include clc/math/gentype.inc
diff --git a/generic/include/clc/math/max.inc b/generic/include/clc/math/max.inc
new file mode 100644
index 000..ce6c6d0
--- /dev/null
+++ b/generic/include/clc/math/max.inc
@@ -0,0 +1 @@
+_CLC_OVERLOAD _CLC_DECL GENTYPE max(GENTYPE a, GENTYPE b);
diff --git a/generic/lib/SOURCES b/generic/lib/SOURCES
index 86c008b..b593941 100644
--- a/generic/lib/SOURCES
+++ b/generic/lib/SOURCES
@@ -7,6 +7,7 @@ integer/abs.cl
 integer/add_sat.cl
 integer/add_sat.ll
 integer/add_sat_impl.ll
+integer/max.cl
 integer/sub_sat.cl
 integer/sub_sat.ll
 integer/sub_sat_impl.ll
@@ -14,6 +15,7 @@ math/fmax.cl
 math/fmin.cl
 math/hypot.cl
 math/mad.cl
+math/max.cl
 relational/any.cl
 workitem/get_global_id.cl
 workitem/get_global_size.cl
diff --git a/generic/lib/integer/max.cl b/generic/lib/integer/max.cl
new file mode 100644
index 000..89fec7c
--- /dev/null
+++ b/generic/lib/integer/max.cl
@@ -0,0 +1,4 @@
+#include clc/clc.h
+
+#define BODY max.inc
+#include clc/integer/gentype.inc
diff --git a/generic/lib/integer/max.inc b/generic/lib/integer/max.inc
new file mode 100644
index 000..37409fc
--- /dev/null
+++ b/generic/lib/integer/max.inc
@@ -0,0 +1,3 @@
+_CLC_OVERLOAD _CLC_DEF GENTYPE max(GENTYPE a, GENTYPE b) {
+  return (a  b ? a : b);
+}
diff --git a/generic/lib/math/max.cl b/generic/lib/math/max.cl
new file mode 100644
index 000..d1254a7
--- /dev/null
+++ b/generic/lib/math/max.cl
@@ -0,0 +1,8 @@
+#include clc/clc.h
+
+#ifdef cl_khr_fp64
+#pragma OPENCL EXTENSION cl_khr_fp64 : enable
+#endif
+
+#define BODY max.inc
+#include clc/math/gentype.inc
diff --git a/generic/lib/math/max.inc b/generic/lib/math/max.inc
new file mode 100644
index 000..37409fc
--- /dev/null
+++ b/generic/lib/math/max.inc
@@ -0,0 +1,3 @@
+_CLC_OVERLOAD _CLC_DEF GENTYPE max(GENTYPE a, GENTYPE b) {
+  return (a  b ? a : b);
+}
-- 
1.7.10.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/3] libclc: Fix abs_diff builtin integer function

2013-03-14 Thread Aaron Watry
---
 generic/lib/SOURCES  |1 +
 generic/lib/integer/abs_diff.inc |2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/generic/lib/SOURCES b/generic/lib/SOURCES
index b593941..a97213b 100644
--- a/generic/lib/SOURCES
+++ b/generic/lib/SOURCES
@@ -4,6 +4,7 @@ geometric/dot.cl
 geometric/length.cl
 geometric/normalize.cl
 integer/abs.cl
+integer/abs_diff.cl
 integer/add_sat.cl
 integer/add_sat.ll
 integer/add_sat_impl.ll
diff --git a/generic/lib/integer/abs_diff.inc b/generic/lib/integer/abs_diff.inc
index 93efdba..6ad57ee 100644
--- a/generic/lib/integer/abs_diff.inc
+++ b/generic/lib/integer/abs_diff.inc
@@ -1,3 +1,3 @@
-_CLC_OVERLOAD _CLC_DEF UGENTYPE abs_diff(GENTYPE x) {
+_CLC_OVERLOAD _CLC_DEF UGENTYPE abs_diff(GENTYPE x, GENTYPE y) {
   return __builtin_astype((GENTYPE)(x  y ? x-y : y-x), UGENTYPE);
 }
-- 
1.7.10.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] libclc: Improve libclc handling of built-in functions

2013-03-14 Thread Aaron Watry
This series depends on the one-off patch I just sent to add max().

1) Fix the broken abs_diff integer built-in.
2) Add clamp for both integer and floating types in a new shared/ dir in order
   to reduce code duplication and improve maintainability.
3) Move the max() function into the shared/ directory. 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/3] libclc: Add clamp() builtin for integer/floating point

2013-03-14 Thread Aaron Watry
Created under a new shared/ directory for functions which are available for
both integer and floating point types.
---
 generic/include/clc/clc.h|3 +++
 generic/include/clc/shared/clamp.h   |5 +
 generic/include/clc/shared/clamp.inc |1 +
 generic/lib/SOURCES  |1 +
 generic/lib/shared/clamp.cl  |   11 +++
 generic/lib/shared/clamp.inc |3 +++
 6 files changed, 24 insertions(+)
 create mode 100644 generic/include/clc/shared/clamp.h
 create mode 100644 generic/include/clc/shared/clamp.inc
 create mode 100644 generic/lib/shared/clamp.cl
 create mode 100644 generic/lib/shared/clamp.inc

diff --git a/generic/include/clc/clc.h b/generic/include/clc/clc.h
index f6668a3..80ecd01 100644
--- a/generic/include/clc/clc.h
+++ b/generic/include/clc/clc.h
@@ -67,6 +67,9 @@
 #include clc/integer/max.h
 #include clc/integer/sub_sat.h
 
+/* 6.11.2 and 6.11.3 Shared Integer/Math Functions */
+#include clc/shared/clamp.h
+
 /* 6.11.5 Geometric Functions */
 #include clc/geometric/cross.h
 #include clc/geometric/dot.h
diff --git a/generic/include/clc/shared/clamp.h 
b/generic/include/clc/shared/clamp.h
new file mode 100644
index 000..5c2ebd0
--- /dev/null
+++ b/generic/include/clc/shared/clamp.h
@@ -0,0 +1,5 @@
+#define BODY clc/shared/clamp.inc
+#include clc/integer/gentype.inc
+
+#define BODY clc/shared/clamp.inc
+#include clc/math/gentype.inc
diff --git a/generic/include/clc/shared/clamp.inc 
b/generic/include/clc/shared/clamp.inc
new file mode 100644
index 000..3e3a435
--- /dev/null
+++ b/generic/include/clc/shared/clamp.inc
@@ -0,0 +1 @@
+_CLC_OVERLOAD _CLC_DECL GENTYPE clamp(GENTYPE x, GENTYPE y, GENTYPE z);
diff --git a/generic/lib/SOURCES b/generic/lib/SOURCES
index a97213b..0d477ba 100644
--- a/generic/lib/SOURCES
+++ b/generic/lib/SOURCES
@@ -18,5 +18,6 @@ math/hypot.cl
 math/mad.cl
 math/max.cl
 relational/any.cl
+shared/clamp.cl
 workitem/get_global_id.cl
 workitem/get_global_size.cl
diff --git a/generic/lib/shared/clamp.cl b/generic/lib/shared/clamp.cl
new file mode 100644
index 000..0e8d223
--- /dev/null
+++ b/generic/lib/shared/clamp.cl
@@ -0,0 +1,11 @@
+#include clc/clc.h
+
+#define BODY clamp.inc
+#include clc/integer/gentype.inc
+
+#ifdef cl_khr_fp64
+#pragma OPENCL EXTENSION cl_khr_fp64 : enable
+#endif
+
+#define BODY clamp.inc
+#include clc/math/gentype.inc
diff --git a/generic/lib/shared/clamp.inc b/generic/lib/shared/clamp.inc
new file mode 100644
index 000..ed49b8e
--- /dev/null
+++ b/generic/lib/shared/clamp.inc
@@ -0,0 +1,3 @@
+_CLC_OVERLOAD _CLC_DEF GENTYPE clamp(GENTYPE x, GENTYPE y, GENTYPE z) {
+  return (x  z ? z : (x  y ? y : x));
+}
-- 
1.7.10.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/3] libclc: Move max builtin to shared/

2013-03-14 Thread Aaron Watry
Max(x,y) is available for all integer/floating types.
---
 generic/include/clc/clc.h   |3 +--
 generic/include/clc/integer/max.h   |2 --
 generic/include/clc/integer/max.inc |1 -
 generic/include/clc/math/max.h  |2 --
 generic/include/clc/math/max.inc|1 -
 generic/include/clc/shared/max.h|5 +
 generic/include/clc/shared/max.inc  |1 +
 generic/lib/SOURCES |3 +--
 generic/lib/integer/max.cl  |4 
 generic/lib/integer/max.inc |3 ---
 generic/lib/math/max.cl |8 
 generic/lib/math/max.inc|3 ---
 generic/lib/shared/max.cl   |   11 +++
 generic/lib/shared/max.inc  |3 +++
 14 files changed, 22 insertions(+), 28 deletions(-)
 delete mode 100644 generic/include/clc/integer/max.h
 delete mode 100644 generic/include/clc/integer/max.inc
 delete mode 100644 generic/include/clc/math/max.h
 delete mode 100644 generic/include/clc/math/max.inc
 create mode 100644 generic/include/clc/shared/max.h
 create mode 100644 generic/include/clc/shared/max.inc
 delete mode 100644 generic/lib/integer/max.cl
 delete mode 100644 generic/lib/integer/max.inc
 delete mode 100644 generic/lib/math/max.cl
 delete mode 100644 generic/lib/math/max.inc
 create mode 100644 generic/lib/shared/max.cl
 create mode 100644 generic/lib/shared/max.inc

diff --git a/generic/include/clc/clc.h b/generic/include/clc/clc.h
index 80ecd01..c3d7d59 100644
--- a/generic/include/clc/clc.h
+++ b/generic/include/clc/clc.h
@@ -45,7 +45,6 @@
 #include clc/math/log.h
 #include clc/math/log2.h
 #include clc/math/mad.h
-#include clc/math/max.h
 #include clc/math/pow.h
 #include clc/math/sin.h
 #include clc/math/sqrt.h
@@ -64,11 +63,11 @@
 #include clc/integer/abs.h
 #include clc/integer/abs_diff.h
 #include clc/integer/add_sat.h
-#include clc/integer/max.h
 #include clc/integer/sub_sat.h
 
 /* 6.11.2 and 6.11.3 Shared Integer/Math Functions */
 #include clc/shared/clamp.h
+#include clc/shared/max.h
 
 /* 6.11.5 Geometric Functions */
 #include clc/geometric/cross.h
diff --git a/generic/include/clc/integer/max.h 
b/generic/include/clc/integer/max.h
deleted file mode 100644
index e74a459..000
--- a/generic/include/clc/integer/max.h
+++ /dev/null
@@ -1,2 +0,0 @@
-#define BODY clc/integer/max.inc
-#include clc/integer/gentype.inc
diff --git a/generic/include/clc/integer/max.inc 
b/generic/include/clc/integer/max.inc
deleted file mode 100644
index ce6c6d0..000
--- a/generic/include/clc/integer/max.inc
+++ /dev/null
@@ -1 +0,0 @@
-_CLC_OVERLOAD _CLC_DECL GENTYPE max(GENTYPE a, GENTYPE b);
diff --git a/generic/include/clc/math/max.h b/generic/include/clc/math/max.h
deleted file mode 100644
index 3d158f1..000
--- a/generic/include/clc/math/max.h
+++ /dev/null
@@ -1,2 +0,0 @@
-#define BODY clc/math/max.inc
-#include clc/math/gentype.inc
diff --git a/generic/include/clc/math/max.inc b/generic/include/clc/math/max.inc
deleted file mode 100644
index ce6c6d0..000
--- a/generic/include/clc/math/max.inc
+++ /dev/null
@@ -1 +0,0 @@
-_CLC_OVERLOAD _CLC_DECL GENTYPE max(GENTYPE a, GENTYPE b);
diff --git a/generic/include/clc/shared/max.h b/generic/include/clc/shared/max.h
new file mode 100644
index 000..7967d4a
--- /dev/null
+++ b/generic/include/clc/shared/max.h
@@ -0,0 +1,5 @@
+#define BODY clc/shared/max.inc
+#include clc/integer/gentype.inc
+
+#define BODY clc/shared/max.inc
+#include clc/math/gentype.inc
diff --git a/generic/include/clc/shared/max.inc 
b/generic/include/clc/shared/max.inc
new file mode 100644
index 000..ce6c6d0
--- /dev/null
+++ b/generic/include/clc/shared/max.inc
@@ -0,0 +1 @@
+_CLC_OVERLOAD _CLC_DECL GENTYPE max(GENTYPE a, GENTYPE b);
diff --git a/generic/lib/SOURCES b/generic/lib/SOURCES
index 0d477ba..f639c83 100644
--- a/generic/lib/SOURCES
+++ b/generic/lib/SOURCES
@@ -8,7 +8,6 @@ integer/abs_diff.cl
 integer/add_sat.cl
 integer/add_sat.ll
 integer/add_sat_impl.ll
-integer/max.cl
 integer/sub_sat.cl
 integer/sub_sat.ll
 integer/sub_sat_impl.ll
@@ -16,8 +15,8 @@ math/fmax.cl
 math/fmin.cl
 math/hypot.cl
 math/mad.cl
-math/max.cl
 relational/any.cl
 shared/clamp.cl
+shared/max.cl
 workitem/get_global_id.cl
 workitem/get_global_size.cl
diff --git a/generic/lib/integer/max.cl b/generic/lib/integer/max.cl
deleted file mode 100644
index 89fec7c..000
--- a/generic/lib/integer/max.cl
+++ /dev/null
@@ -1,4 +0,0 @@
-#include clc/clc.h
-
-#define BODY max.inc
-#include clc/integer/gentype.inc
diff --git a/generic/lib/integer/max.inc b/generic/lib/integer/max.inc
deleted file mode 100644
index 37409fc..000
--- a/generic/lib/integer/max.inc
+++ /dev/null
@@ -1,3 +0,0 @@
-_CLC_OVERLOAD _CLC_DEF GENTYPE max(GENTYPE a, GENTYPE b) {
-  return (a  b ? a : b);
-}
diff --git a/generic/lib/math/max.cl b/generic/lib/math/max.cl
deleted file mode 100644
index d1254a7..000
--- a/generic/lib/math/max.cl
+++ /dev/null
@@ -1,8 +0,0 @@
-#include clc/clc.h
-
-#ifdef cl_khr_fp64
-#pragma 

Re: [Mesa-dev] libclc: Improve libclc handling of built-in functions

2013-03-14 Thread Aaron Watry
Note: I have tested all of these with the 32-bit signed integer data type
for scalar kernels only... r600g chokes on almost anything else due to a
missing vload/vstore implementation and buggy/incomplete handling of
char/short/long data types in CL kernels.

--Aaron


On Thu, Mar 14, 2013 at 10:01 PM, Aaron Watry awa...@gmail.com wrote:

 This series depends on the one-off patch I just sent to add max().

 1) Fix the broken abs_diff integer built-in.
 2) Add clamp for both integer and floating types in a new shared/ dir in
 order
to reduce code duplication and improve maintainability.
 3) Move the max() function into the shared/ directory.


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] libclc: implement rotate builtin

2013-03-23 Thread Aaron Watry
This implementation does a lot of bit shifting and masking. Suffice to say,
this is somewhat suboptimal... but it does look to produce correct results
(after the piglit tests were corrected for sign extension issues).

Someone who knows LLVM better than I could re-write this more efficiently.
---
 generic/include/clc/clc.h   |1 +
 generic/include/clc/integer/gentype.inc |   11 ++
 generic/include/clc/integer/rotate.h|2 ++
 generic/include/clc/integer/rotate.inc  |1 +
 generic/lib/SOURCES |1 +
 generic/lib/integer/rotate.cl   |4 
 generic/lib/integer/rotate.inc  |   35 +++
 7 files changed, 55 insertions(+)
 create mode 100644 generic/include/clc/integer/rotate.h
 create mode 100644 generic/include/clc/integer/rotate.inc
 create mode 100644 generic/lib/integer/rotate.cl
 create mode 100644 generic/lib/integer/rotate.inc

diff --git a/generic/include/clc/clc.h b/generic/include/clc/clc.h
index c3d7d59..72f518a 100644
--- a/generic/include/clc/clc.h
+++ b/generic/include/clc/clc.h
@@ -63,6 +63,7 @@
 #include clc/integer/abs.h
 #include clc/integer/abs_diff.h
 #include clc/integer/add_sat.h
+#include clc/integer/rotate.h
 #include clc/integer/sub_sat.h
 
 /* 6.11.2 and 6.11.3 Shared Integer/Math Functions */
diff --git a/generic/include/clc/integer/gentype.inc 
b/generic/include/clc/integer/gentype.inc
index 0b32efd..005b9af 100644
--- a/generic/include/clc/integer/gentype.inc
+++ b/generic/include/clc/integer/gentype.inc
@@ -1,3 +1,4 @@
+#define GENSIZE 8
 #define GENTYPE char
 #define UGENTYPE uchar
 #define SGENTYPE char
@@ -94,6 +95,9 @@
 #undef UGENTYPE
 #undef SGENTYPE
 
+#undef GENSIZE
+#define GENSIZE 16
+
 #define GENTYPE short
 #define UGENTYPE ushort
 #define SGENTYPE short
@@ -190,6 +194,9 @@
 #undef UGENTYPE
 #undef SGENTYPE
 
+#undef GENSIZE
+#define GENSIZE 32
+
 #define GENTYPE int
 #define UGENTYPE uint
 #define SGENTYPE int
@@ -286,6 +293,9 @@
 #undef UGENTYPE
 #undef SGENTYPE
 
+#undef GENSIZE
+#define GENSIZE 64
+
 #define GENTYPE long
 #define UGENTYPE ulong
 #define SGENTYPE long
@@ -382,4 +392,5 @@
 #undef UGENTYPE
 #undef SGENTYPE
 
+#undef GENSIZE
 #undef BODY
diff --git a/generic/include/clc/integer/rotate.h 
b/generic/include/clc/integer/rotate.h
new file mode 100644
index 000..e163bc8
--- /dev/null
+++ b/generic/include/clc/integer/rotate.h
@@ -0,0 +1,2 @@
+#define BODY clc/integer/rotate.inc
+#include clc/integer/gentype.inc
diff --git a/generic/include/clc/integer/rotate.inc 
b/generic/include/clc/integer/rotate.inc
new file mode 100644
index 000..5720e1c
--- /dev/null
+++ b/generic/include/clc/integer/rotate.inc
@@ -0,0 +1 @@
+_CLC_OVERLOAD _CLC_DECL GENTYPE rotate(GENTYPE x, GENTYPE y);
diff --git a/generic/lib/SOURCES b/generic/lib/SOURCES
index f639c83..495b3e7 100644
--- a/generic/lib/SOURCES
+++ b/generic/lib/SOURCES
@@ -8,6 +8,7 @@ integer/abs_diff.cl
 integer/add_sat.cl
 integer/add_sat.ll
 integer/add_sat_impl.ll
+integer/rotate.cl
 integer/sub_sat.cl
 integer/sub_sat.ll
 integer/sub_sat_impl.ll
diff --git a/generic/lib/integer/rotate.cl b/generic/lib/integer/rotate.cl
new file mode 100644
index 000..d7eff2b
--- /dev/null
+++ b/generic/lib/integer/rotate.cl
@@ -0,0 +1,4 @@
+#include clc/clc.h
+
+#define BODY rotate.inc
+#include clc/integer/gentype.inc
diff --git a/generic/lib/integer/rotate.inc b/generic/lib/integer/rotate.inc
new file mode 100644
index 000..e83dd51
--- /dev/null
+++ b/generic/lib/integer/rotate.inc
@@ -0,0 +1,35 @@
+/**
+ * Not necessarily optimal... but it produces correct results (at least for 
int)
+ * If we're lucky, LLVM will recognize the pattern and produce rotate
+ * instructions:
+ * http://llvm.1065342.n5.nabble.com/rotate-td47679.html
+ * 
+ * Eventually, someone should feel free to implement an llvm-specific version
+ */
+
+_CLC_OVERLOAD _CLC_DEF GENTYPE rotate(GENTYPE x, GENTYPE n){
+//Try to avoid extra work if someone's spinning the value through multiple
+//full rotations
+n = n % (GENTYPE)GENSIZE;
+
+//Determine if we're doing a right or left shift on each component
+//The actual shift algorithm is based on a rotate right
+//e.g. a rotate of int by 5 bits becomes rotate right by 26 bits
+// and a rotate of int by -4 bits becomes rotate right by 4
+GENTYPE amt = (n  (GENTYPE)0 ? (GENTYPE)GENSIZE - n : (GENTYPE)0 - n );
+
+//Calculate the bits that will wrap
+GENTYPE mask = ( (GENTYPE)1  amt ) - (GENTYPE)1;
+GENTYPE wrapped_bits = x  mask;
+
+//Shift the input value right and then AND a mask that eliminates
+//sign-extension interference
+//if the rotate amount is 0, just use ~0 for a mask
+GENTYPE se_mask = !amt ? ~((GENTYPE)0) : 
+( ( (GENTYPE)1  ((GENTYPE)GENSIZE - amt) ) - (GENTYPE)1 );
+GENTYPE unwrapped_bits = x  amt;
+unwrapped_bits = se_mask;
+
+//Finally shift the input right after moving the wrapped bits 

[Mesa-dev] (no subject)

2013-04-13 Thread Aaron Watry
Implements the min() OpenCL built-in in 2 stages.
1) Implement min() where the two argument types match
2) Make changes to support min(vec,scalar)

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/2] libclc: implement initial version of min()

2013-04-13 Thread Aaron Watry
This doesn't handle the integer cases for min(vector, scalar).
---
 generic/include/clc/clc.h  |1 +
 generic/include/clc/shared/min.h   |5 +
 generic/include/clc/shared/min.inc |1 +
 generic/lib/SOURCES|1 +
 generic/lib/shared/min.cl  |   11 +++
 generic/lib/shared/min.inc |3 +++
 6 files changed, 22 insertions(+)
 create mode 100644 generic/include/clc/shared/min.h
 create mode 100644 generic/include/clc/shared/min.inc
 create mode 100644 generic/lib/shared/min.cl
 create mode 100644 generic/lib/shared/min.inc

diff --git a/generic/include/clc/clc.h b/generic/include/clc/clc.h
index 72f518a..74f1126 100644
--- a/generic/include/clc/clc.h
+++ b/generic/include/clc/clc.h
@@ -69,6 +69,7 @@
 /* 6.11.2 and 6.11.3 Shared Integer/Math Functions */
 #include clc/shared/clamp.h
 #include clc/shared/max.h
+#include clc/shared/min.h
 
 /* 6.11.5 Geometric Functions */
 #include clc/geometric/cross.h
diff --git a/generic/include/clc/shared/min.h b/generic/include/clc/shared/min.h
new file mode 100644
index 000..e16b45d
--- /dev/null
+++ b/generic/include/clc/shared/min.h
@@ -0,0 +1,5 @@
+#define BODY clc/shared/min.inc
+#include clc/integer/gentype.inc
+
+#define BODY clc/shared/min.inc
+#include clc/math/gentype.inc
diff --git a/generic/include/clc/shared/min.inc 
b/generic/include/clc/shared/min.inc
new file mode 100644
index 000..3bc9880
--- /dev/null
+++ b/generic/include/clc/shared/min.inc
@@ -0,0 +1 @@
+_CLC_OVERLOAD _CLC_DECL GENTYPE min(GENTYPE a, GENTYPE b);
diff --git a/generic/lib/SOURCES b/generic/lib/SOURCES
index 495b3e7..18c8afb 100644
--- a/generic/lib/SOURCES
+++ b/generic/lib/SOURCES
@@ -19,5 +19,6 @@ math/mad.cl
 relational/any.cl
 shared/clamp.cl
 shared/max.cl
+shared/min.cl
 workitem/get_global_id.cl
 workitem/get_global_size.cl
diff --git a/generic/lib/shared/min.cl b/generic/lib/shared/min.cl
new file mode 100644
index 000..49481cb
--- /dev/null
+++ b/generic/lib/shared/min.cl
@@ -0,0 +1,11 @@
+#include clc/clc.h
+
+#define BODY min.inc
+#include clc/integer/gentype.inc
+
+#ifdef cl_khr_fp64
+#pragma OPENCL EXTENSION cl_khr_fp64 : enable
+#endif
+
+#define BODY min.inc
+#include clc/math/gentype.inc
diff --git a/generic/lib/shared/min.inc b/generic/lib/shared/min.inc
new file mode 100644
index 000..b99bc35
--- /dev/null
+++ b/generic/lib/shared/min.inc
@@ -0,0 +1,3 @@
+_CLC_OVERLOAD _CLC_DEF GENTYPE min(GENTYPE a, GENTYPE b) {
+  return (a  b ? a : b);
+}
-- 
1.7.10.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/2] libclc: Implement the min(vec, scalar) version of the min builtin.

2013-04-13 Thread Aaron Watry
Checks if the current GENTYPE is scalar, and if not, then defines a separate
implementation of the function which casts the second arg to vector before
proceeding.
---
 generic/include/clc/integer/gentype.inc |   23 +++
 generic/include/clc/math/gentype.inc|8 
 generic/include/clc/shared/min.inc  |4 
 generic/lib/shared/min.inc  |6 ++
 4 files changed, 41 insertions(+)

diff --git a/generic/include/clc/integer/gentype.inc 
b/generic/include/clc/integer/gentype.inc
index dd7d061..95a37d5 100644
--- a/generic/include/clc/integer/gentype.inc
+++ b/generic/include/clc/integer/gentype.inc
@@ -1,4 +1,8 @@
+//These 2 defines only change when switching between data sizes or base types 
to
+//keep this file manageable.
 #define GENSIZE 8
+#define SCALAR_GENTYPE char
+
 #define GENTYPE char
 #define UGENTYPE uchar
 #define SGENTYPE char
@@ -49,6 +53,9 @@
 #undef UGENTYPE
 #undef SGENTYPE
 
+#undef SCALAR_GENTYPE
+#define SCALAR_GENTYPE uchar
+
 #define GENTYPE uchar
 #define UGENTYPE uchar
 #define SGENTYPE char
@@ -101,6 +108,8 @@
 
 #undef GENSIZE
 #define GENSIZE 16
+#undef SCALAR_GENTYPE
+#define SCALAR_GENTYPE short
 
 #define GENTYPE short
 #define UGENTYPE ushort
@@ -152,6 +161,9 @@
 #undef UGENTYPE
 #undef SGENTYPE
 
+#undef SCALAR_GENTYPE
+#define SCALAR_GENTYPE ushort
+
 #define GENTYPE ushort
 #define UGENTYPE ushort
 #define SGENTYPE short
@@ -204,6 +216,8 @@
 
 #undef GENSIZE
 #define GENSIZE 32
+#undef SCALAR_GENTYPE
+#define SCALAR_GENTYPE int
 
 #define GENTYPE int
 #define UGENTYPE uint
@@ -255,6 +269,9 @@
 #undef UGENTYPE
 #undef SGENTYPE
 
+#undef SCALAR_GENTYPE
+#define SCALAR_GENTYPE uint
+
 #define GENTYPE uint
 #define UGENTYPE uint
 #define SGENTYPE int
@@ -307,6 +324,8 @@
 
 #undef GENSIZE
 #define GENSIZE 64
+#undef SCALAR_GENTYPE
+#define SCALAR_GENTYPE long
 
 #define GENTYPE long
 #define UGENTYPE ulong
@@ -358,6 +377,9 @@
 #undef UGENTYPE
 #undef SGENTYPE
 
+#undef SCALAR_GENTYPE
+#define SCALAR_GENTYPE ulong
+
 #define GENTYPE ulong
 #define UGENTYPE ulong
 #define SGENTYPE long
@@ -409,4 +431,5 @@
 #undef SGENTYPE
 
 #undef GENSIZE
+#undef SCALAR_GENTYPE
 #undef BODY
diff --git a/generic/include/clc/math/gentype.inc 
b/generic/include/clc/math/gentype.inc
index b525c4b..4ed2151 100644
--- a/generic/include/clc/math/gentype.inc
+++ b/generic/include/clc/math/gentype.inc
@@ -1,3 +1,5 @@
+#define SCALAR_GENTYPE float
+
 #define GENTYPE float
 #define SCALAR
 #include BODY
@@ -24,7 +26,11 @@
 #include BODY
 #undef GENTYPE
 
+#undef SCALAR_GENTYPE
+
 #ifdef cl_khr_fp64
+#define SCALAR_GENTYPE double
+
 #define SCALAR
 #define GENTYPE double
 #include BODY
@@ -50,6 +56,8 @@
 #define GENTYPE double16
 #include BODY
 #undef GENTYPE
+
+#undef SCALAR_GENTYPE
 #endif
 
 #undef BODY
diff --git a/generic/include/clc/shared/min.inc 
b/generic/include/clc/shared/min.inc
index 3bc9880..cf3afaf 100644
--- a/generic/include/clc/shared/min.inc
+++ b/generic/include/clc/shared/min.inc
@@ -1 +1,5 @@
 _CLC_OVERLOAD _CLC_DECL GENTYPE min(GENTYPE a, GENTYPE b);
+
+#ifndef SCALAR
+_CLC_OVERLOAD _CLC_DECL GENTYPE min(GENTYPE a, SCALAR_GENTYPE b);
+#endif
\ No newline at end of file
diff --git a/generic/lib/shared/min.inc b/generic/lib/shared/min.inc
index b99bc35..58a22e1 100644
--- a/generic/lib/shared/min.inc
+++ b/generic/lib/shared/min.inc
@@ -1,3 +1,9 @@
 _CLC_OVERLOAD _CLC_DEF GENTYPE min(GENTYPE a, GENTYPE b) {
   return (a  b ? a : b);
 }
+
+#ifndef SCALAR
+_CLC_OVERLOAD _CLC_DEF GENTYPE min(GENTYPE a, SCALAR_GENTYPE b) {
+  return (a  (GENTYPE)b ? a : (GENTYPE)b);
+}
+#endif
-- 
1.7.10.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] libclc: Add clamp(vec, scalar, scalar) and max(vec, scalar)

2013-04-13 Thread Aaron Watry
For any GENTYPE that isn't scalar, we need to implement a mixed
vector/scalar version of clamp/max.

This depends on the min() patches I sent to the list a few minutes ago.
---
 generic/include/clc/shared/clamp.inc |4 
 generic/include/clc/shared/max.inc   |4 
 generic/lib/shared/clamp.inc |6 ++
 generic/lib/shared/max.inc   |6 ++
 4 files changed, 20 insertions(+)

diff --git a/generic/include/clc/shared/clamp.inc 
b/generic/include/clc/shared/clamp.inc
index 3e3a435..67c8142 100644
--- a/generic/include/clc/shared/clamp.inc
+++ b/generic/include/clc/shared/clamp.inc
@@ -1 +1,5 @@
 _CLC_OVERLOAD _CLC_DECL GENTYPE clamp(GENTYPE x, GENTYPE y, GENTYPE z);
+
+#ifndef SCALAR
+_CLC_OVERLOAD _CLC_DECL GENTYPE clamp(GENTYPE x, SCALAR_GENTYPE y, 
SCALAR_GENTYPE z);
+#endif
diff --git a/generic/include/clc/shared/max.inc 
b/generic/include/clc/shared/max.inc
index ce6c6d0..9fe73c4 100644
--- a/generic/include/clc/shared/max.inc
+++ b/generic/include/clc/shared/max.inc
@@ -1 +1,5 @@
 _CLC_OVERLOAD _CLC_DECL GENTYPE max(GENTYPE a, GENTYPE b);
+
+#ifndef SCALAR
+_CLC_OVERLOAD _CLC_DECL GENTYPE max(GENTYPE a, SCALAR_GENTYPE b);
+#endif
diff --git a/generic/lib/shared/clamp.inc b/generic/lib/shared/clamp.inc
index ed49b8e..58370d3 100644
--- a/generic/lib/shared/clamp.inc
+++ b/generic/lib/shared/clamp.inc
@@ -1,3 +1,9 @@
 _CLC_OVERLOAD _CLC_DEF GENTYPE clamp(GENTYPE x, GENTYPE y, GENTYPE z) {
   return (x  z ? z : (x  y ? y : x));
 }
+
+#ifndef SCALAR
+_CLC_OVERLOAD _CLC_DEF GENTYPE clamp(GENTYPE x, SCALAR_GENTYPE y, 
SCALAR_GENTYPE z) {
+  return (x  (GENTYPE)z ? (GENTYPE)z : (x  (GENTYPE)y ? (GENTYPE)y : x));
+}
+#endif
\ No newline at end of file
diff --git a/generic/lib/shared/max.inc b/generic/lib/shared/max.inc
index 37409fc..6a12b6f 100644
--- a/generic/lib/shared/max.inc
+++ b/generic/lib/shared/max.inc
@@ -1,3 +1,9 @@
 _CLC_OVERLOAD _CLC_DEF GENTYPE max(GENTYPE a, GENTYPE b) {
   return (a  b ? a : b);
 }
+
+#ifndef SCALAR
+_CLC_OVERLOAD _CLC_DEF GENTYPE max(GENTYPE a, SCALAR_GENTYPE b) {
+  return (a  (GENTYPE)b ? a : (GENTYPE)b);
+}
+#endif
\ No newline at end of file
-- 
1.7.10.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] libclc: Rename [add|sub]_sat.ll to [add|sub]_sat_if.ll

2013-04-15 Thread Aaron Watry
configure.py allows overloading *.cl with *.ll, but will only ever build
the first file listed in SOURCES of ${file}.cl and ${file}.ll

add_sat, sub_sat, (and the soon to be submitted clz) all define interfaces in
${function_name}.ll which are implemented in ${function_name}_impl.ll.

Renaming the interface files is enough to get them to build again, fixing
CL usage of these functions.

Tested on clover/r600g.
---
 generic/lib/SOURCES   |4 +--
 generic/lib/integer/add_sat.ll|   55 -
 generic/lib/integer/add_sat_if.ll |   55 +
 generic/lib/integer/sub_sat.ll|   55 -
 generic/lib/integer/sub_sat_if.ll |   55 +
 5 files changed, 112 insertions(+), 112 deletions(-)
 delete mode 100644 generic/lib/integer/add_sat.ll
 create mode 100644 generic/lib/integer/add_sat_if.ll
 delete mode 100644 generic/lib/integer/sub_sat.ll
 create mode 100644 generic/lib/integer/sub_sat_if.ll

diff --git a/generic/lib/SOURCES b/generic/lib/SOURCES
index 18c8afb..eac6c60 100644
--- a/generic/lib/SOURCES
+++ b/generic/lib/SOURCES
@@ -6,11 +6,11 @@ geometric/normalize.cl
 integer/abs.cl
 integer/abs_diff.cl
 integer/add_sat.cl
-integer/add_sat.ll
+integer/add_sat_if.ll
 integer/add_sat_impl.ll
 integer/rotate.cl
 integer/sub_sat.cl
-integer/sub_sat.ll
+integer/sub_sat_if.ll
 integer/sub_sat_impl.ll
 math/fmax.cl
 math/fmin.cl
diff --git a/generic/lib/integer/add_sat.ll b/generic/lib/integer/add_sat.ll
deleted file mode 100644
index bcbe4c0..000
--- a/generic/lib/integer/add_sat.ll
+++ /dev/null
@@ -1,55 +0,0 @@
-declare i8 @__clc_add_sat_impl_s8(i8 %x, i8 %y)
-
-define i8 @__clc_add_sat_s8(i8 %x, i8 %y) nounwind readnone alwaysinline {
-  %call = call i8 @__clc_add_sat_impl_s8(i8 %x, i8 %y)
-  ret i8 %call
-}
-
-declare i8 @__clc_add_sat_impl_u8(i8 %x, i8 %y)
-
-define i8 @__clc_add_sat_u8(i8 %x, i8 %y) nounwind readnone alwaysinline {
-  %call = call i8 @__clc_add_sat_impl_u8(i8 %x, i8 %y)
-  ret i8 %call
-}
-
-declare i16 @__clc_add_sat_impl_s16(i16 %x, i16 %y)
-
-define i16 @__clc_add_sat_s16(i16 %x, i16 %y) nounwind readnone alwaysinline {
-  %call = call i16 @__clc_add_sat_impl_s16(i16 %x, i16 %y)
-  ret i16 %call
-}
-
-declare i16 @__clc_add_sat_impl_u16(i16 %x, i16 %y)
-
-define i16 @__clc_add_sat_u16(i16 %x, i16 %y) nounwind readnone alwaysinline {
-  %call = call i16 @__clc_add_sat_impl_u16(i16 %x, i16 %y)
-  ret i16 %call
-}
-
-declare i32 @__clc_add_sat_impl_s32(i32 %x, i32 %y)
-
-define i32 @__clc_add_sat_s32(i32 %x, i32 %y) nounwind readnone alwaysinline {
-  %call = call i32 @__clc_add_sat_impl_s32(i32 %x, i32 %y)
-  ret i32 %call
-}
-
-declare i32 @__clc_add_sat_impl_u32(i32 %x, i32 %y)
-
-define i32 @__clc_add_sat_u32(i32 %x, i32 %y) nounwind readnone alwaysinline {
-  %call = call i32 @__clc_add_sat_impl_u32(i32 %x, i32 %y)
-  ret i32 %call
-}
-
-declare i64 @__clc_add_sat_impl_s64(i64 %x, i64 %y)
-
-define i64 @__clc_add_sat_s64(i64 %x, i64 %y) nounwind readnone alwaysinline {
-  %call = call i64 @__clc_add_sat_impl_s64(i64 %x, i64 %y)
-  ret i64 %call
-}
-
-declare i64 @__clc_add_sat_impl_u64(i64 %x, i64 %y)
-
-define i64 @__clc_add_sat_u64(i64 %x, i64 %y) nounwind readnone alwaysinline {
-  %call = call i64 @__clc_add_sat_impl_u64(i64 %x, i64 %y)
-  ret i64 %call
-}
diff --git a/generic/lib/integer/add_sat_if.ll 
b/generic/lib/integer/add_sat_if.ll
new file mode 100644
index 000..bcbe4c0
--- /dev/null
+++ b/generic/lib/integer/add_sat_if.ll
@@ -0,0 +1,55 @@
+declare i8 @__clc_add_sat_impl_s8(i8 %x, i8 %y)
+
+define i8 @__clc_add_sat_s8(i8 %x, i8 %y) nounwind readnone alwaysinline {
+  %call = call i8 @__clc_add_sat_impl_s8(i8 %x, i8 %y)
+  ret i8 %call
+}
+
+declare i8 @__clc_add_sat_impl_u8(i8 %x, i8 %y)
+
+define i8 @__clc_add_sat_u8(i8 %x, i8 %y) nounwind readnone alwaysinline {
+  %call = call i8 @__clc_add_sat_impl_u8(i8 %x, i8 %y)
+  ret i8 %call
+}
+
+declare i16 @__clc_add_sat_impl_s16(i16 %x, i16 %y)
+
+define i16 @__clc_add_sat_s16(i16 %x, i16 %y) nounwind readnone alwaysinline {
+  %call = call i16 @__clc_add_sat_impl_s16(i16 %x, i16 %y)
+  ret i16 %call
+}
+
+declare i16 @__clc_add_sat_impl_u16(i16 %x, i16 %y)
+
+define i16 @__clc_add_sat_u16(i16 %x, i16 %y) nounwind readnone alwaysinline {
+  %call = call i16 @__clc_add_sat_impl_u16(i16 %x, i16 %y)
+  ret i16 %call
+}
+
+declare i32 @__clc_add_sat_impl_s32(i32 %x, i32 %y)
+
+define i32 @__clc_add_sat_s32(i32 %x, i32 %y) nounwind readnone alwaysinline {
+  %call = call i32 @__clc_add_sat_impl_s32(i32 %x, i32 %y)
+  ret i32 %call
+}
+
+declare i32 @__clc_add_sat_impl_u32(i32 %x, i32 %y)
+
+define i32 @__clc_add_sat_u32(i32 %x, i32 %y) nounwind readnone alwaysinline {
+  %call = call i32 @__clc_add_sat_impl_u32(i32 %x, i32 %y)
+  ret i32 %call
+}
+
+declare i64 @__clc_add_sat_impl_s64(i64 %x, i64 %y)
+
+define i64 @__clc_add_sat_s64(i64 %x, i64 %y) nounwind readnone alwaysinline {

Re: [Mesa-dev] [PATCH 1/6] configure.ac: Remove unused HAVE_PIPE_LOADER_XLIB macro.

2013-04-26 Thread Aaron Watry
For the series:
Tested-by: Aaron Watry awa...@gmail.com

Config:
./configure --with-dri-drivers=radeon --with-gallium-drivers=r600
--enable-texture-float --enable-opencl --enable-gles1 --enable-gles2
--enable-xvmc --enable-vdpau --enable-r600-llvm-compiler
--with-egl-platforms=x11,drm --enable-glx-tls



On Thu, Apr 25, 2013 at 2:02 PM, Matt Turner matts...@gmail.com wrote:

 Added in e1364530 but never used.
 ---
  configure.ac |1 -
  1 files changed, 0 insertions(+), 1 deletions(-)

 diff --git a/configure.ac b/configure.ac
 index 50e60f6..55ea13d 100644
 --- a/configure.ac
 +++ b/configure.ac
 @@ -1899,7 +1899,6 @@ if test x$enable_gallium_loader = xyes; then
  GALLIUM_PIPE_LOADER_LIBS=$GALLIUM_PIPE_LOADER_LIBS
 \$(top_builddir)/src/gallium/winsys/sw/null/libws_null.la

  if test x$NEED_WINSYS_XLIB = xyes; then
 -GALLIUM_PIPE_LOADER_DEFINES=$GALLIUM_PIPE_LOADER_DEFINES
 -DHAVE_PIPE_LOADER_XLIB
  GALLIUM_PIPE_LOADER_LIBS=$GALLIUM_PIPE_LOADER_LIBS
 \$(top_builddir)/src/gallium/winsys/sw/xlib/libws_xlib.la
  fi

 --
 1.7.8.6

 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] R600 Patchset: Optimizations for bfgminer

2013-04-29 Thread Aaron Watry
Hi Tom,

I'm not too qualified to review the llvm code changes, but the changes
looked sane. I did want to point out a few piglit changes/regressions as a
result of this set of patches.

For my HD6850, running latest llvm from git:
gegl-rgb-gamma-u8-to-ragabaf: pass - fail
v3i32-stack: pass - fail
v3i32-stack-array(All Tests): skip - fail

Dumps attached for each of these tests using the following environment var:
R600_DEBUG=cs,compute

Also, I did a make check in llvm, and test/CodeGen/R600/setcc.ll failed
with the following... I also had this same error for the CL abs(int2)
builtin, but that test had previously already been failing, so I haven't
included it above.  I'm assuming that we just need to expand ISD::SRA (just
as we expand v2i32/v4i32 for SHL and SRL).



FAIL: LLVM :: CodeGen/R600/setcc.ll (2104 of 7693)
 TEST 'LLVM :: CodeGen/R600/setcc.ll' FAILED

Script:
--
/home/awatry/src/llvm-build/Debug+Asserts/bin/llc 
/home/awatry/src/llvm/test/CodeGen/R600/setcc.ll -march=r600 -mcpu=redwood
| /home/awatry/src/llvm-build/Debug+Asserts/bin/FileCheck
/home/awatry/src/llvm/test/CodeGen/R600/setcc.ll
--
Exit Code: 2
Command Output (stderr):
--
LLVM ERROR: Cannot select: 0x20dce30: v2i32 = sra 0x20dd310, 0x20dcc30
[ID=26]
  0x20dd310: v2i32 = BUILD_VECTOR 0x20dcf30, 0x20dd210 [ID=25]
0x20dcf30: i32 = shl 0x20dc830, 0x20dcb30 [ID=24]
  0x20dc830: i32 = select_cc 0x20d9d60, 0x20dc130, 0x20dc330,
0x20d9c60, 0x20d9b60 [ID=22]
0x20d9d60: i32 = extract_vector_elt 0x20d9860, 0x20d9c60 [ID=17]
  0x20d9860: v2i32,ch = load 0x20a7828, 0x20d9760,
0x20d9560LD8[undef] [ORD=1] [ID=12]
0x20d9760: i32 = Constant40 [ORD=1] [ID=3]
0x20d9560: i32 = undef [ORD=1] [ID=2]
  0x20d9c60: i32 = Constant0 [ID=6]
0x20dc130: i32 = extract_vector_elt 0x20d9a60, 0x20d9c60 [ID=19]
  0x20d9a60: v2i32,ch = load 0x20a7828, 0x20d9960,
0x20d9560LD8[undef] [ORD=1] [ID=13]
0x20d9960: i32 = Constant48 [ORD=1] [ID=4]
0x20d9560: i32 = undef [ORD=1] [ID=2]
  0x20d9c60: i32 = Constant0 [ID=6]
0x20dc330: i32 = Constant-1 [ID=7]
0x20d9c60: i32 = Constant0 [ID=6]
  0x20dcb30: i32 = Constant31 [ID=9]
0x20dd210: i32 = shl 0x20d9e60, 0x20dcb30 [ID=23]
  0x20d9e60: i32 = select_cc 0x20dc630, 0x20dc730, 0x20dc330,
0x20d9c60, 0x20d9b60 [ID=21]
0x20dc630: i32 = extract_vector_elt 0x20d9860, 0x20dc530 [ID=16]
  0x20d9860: v2i32,ch = load 0x20a7828, 0x20d9760,
0x20d9560LD8[undef] [ORD=1] [ID=12]
0x20d9760: i32 = Constant40 [ORD=1] [ID=3]
0x20d9560: i32 = undef [ORD=1] [ID=2]
  0x20dc530: i32 = Constant1 [ID=8]
0x20dc730: i32 = extract_vector_elt 0x20d9a60, 0x20dc530 [ID=18]
  0x20d9a60: v2i32,ch = load 0x20a7828, 0x20d9960,
0x20d9560LD8[undef] [ORD=1] [ID=13]
0x20d9960: i32 = Constant48 [ORD=1] [ID=4]
0x20d9560: i32 = undef [ORD=1] [ID=2]
  0x20dc530: i32 = Constant1 [ID=8]
0x20dc330: i32 = Constant-1 [ID=7]
0x20d9c60: i32 = Constant0 [ID=6]
  0x20dcb30: i32 = Constant31 [ID=9]
  0x20dcc30: v2i32 = BUILD_VECTOR 0x20dcb30, 0x20dcb30 [ID=14]
0x20dcb30: i32 = Constant31 [ID=9]
0x20dcb30: i32 = Constant31 [ID=9]
In function: setcc_v2i32
FileCheck error: '-' is empty.
--


Testing Time: 48.76s

Failing Tests (1):
LLVM :: CodeGen/R600/setcc.ll

  Expected Passes: 5543
  Expected Failures  : 29
  Unsupported Tests  : 2120
  Unexpected Failures: 1
make[1]: *** [check-local] Error 1
make[1]: Leaving directory `/home/awatry/src/llvm-build/test'
make: *** [check] Error 2



--Aaron





On Mon, Apr 29, 2013 at 3:24 PM, Tom Stellard t...@stellard.net wrote:

 Hi,

 The attached patchset implements a few optimizations for the bfgminer
 bitcoin mining program.

 Please Review.

 -Tom

 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev




gegl-rgb-gamma-u8-to-ragabaf.cl.dump
Description: Binary data


v3i32-stack.cl.dump
Description: Binary data


v3i32-stack-array.cl.dump
Description: Binary data
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] RFC: tgsi opcodes for 32x32 muls with 64bit results

2013-05-03 Thread Aaron Watry
Not sure if this helps much, but...

With gentype being one of:
char, uchar, short, ushort, int, uint, long, ulong, and the widths
being scalar, 2, 3, 4, 8, or 16 components wide.

From the OpenCL 1.1 spec:
gentype mad_hi(gentype a, gentype b):
Computes x * y and returns the high half of the product of x and y

gentype mad_hi (gentype x, gentype y, gentype z)
result = mul_hi(a,b) + c

--Aaron


On Fri, May 3, 2013 at 5:31 AM, Marek Olšák mar...@gmail.com wrote:
 FWIW, this maps nicely to r600, which also has separate instructions
 for the low and high 32 bits. As to what option is better, it really
 depends on whether shading languages and OpenCL expose the
 instructions directly through functions, or whether they just have
 64-bit integers.

 Marek

 On Fri, May 3, 2013 at 1:29 AM, Roland Scheidegger srol...@vmware.com wrote:
 Currently, there's no way to get the high bits of a 32x32
 signed/unsigned integer multiplication with tgsi.
 However, all of d3d10, OpenGL, and OpenCL support that, so we need it as
 well.
 There's essentially two ways how it could be done:
 - a 2-destination instruction returning both high and low bits (this is
 how it looks like in d3d10 and glsl)
 - use the existing umul for the low bits and have another instruction
 for the high bits (this is how it looks like in opencl)

 Well there's other possibilities but these looked like they'd match both
 APIs and HW reasonably (well with the exception of things like sse2
 which would prefer 2x2 32bit inputs and return 2x64bit as one reg...).

 Actually it's two new instructions because unlike for the low bits it
 matters for the high bits if the source operands are signed or unsigned.

 Personally I'm favoring two separate instructions for low and high bits
 to not have to deal with multi-destination instructions, but if someone
 makes a strong case for one returning both low and high bits I could be
 convinced otherwise. I think though two instructions matches most hw
 very well (with the exception of software renderers and possibly intel
 graphics but then a good backend could certainly recognize this).

 So here's what the docs would say about these instructions:


 .. opcode:: IMUL_HI - Signed Integer Multiply High Bits

The high 32bits of the multiplication of 2 signed integers is returned.

 .. math::

   dst.x = src0.x \times src1.x  32

   dst.y = src0.y \times src1.y  32

   dst.z = src0.z \times src1.z  32

   dst.w = src0.w \times src1.w  32


 .. opcode:: UMUL_HI - Unsigned Integer Multiply High Bits

The high 32bits of the multiplication of 2 unsigned integers is returned.

 .. math::

   dst.x = src0.x \times src1.x  32

   dst.y = src0.y \times src1.y  32

   dst.z = src0.z \times src1.z  32

   dst.w = src0.w \times src1.w  32


 Roland
 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev
 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] r600g: don't emit surface_sync after FLUSH_AND_INV_EVENT

2013-05-03 Thread Aaron Watry
I know it's been pushed already, but this also fixes some lockups that
I was seeing on Barts (HD6850) when running piglit's OpenCL tests.

Thanks for fixing this.

--Aaron

On Fri, May 3, 2013 at 9:47 AM, Marek Olšák mar...@gmail.com wrote:
 Reviewed-by: Marek Olšák mar...@gmail.com

 Marek

 On Fri, May 3, 2013 at 4:01 PM,  alexdeuc...@gmail.com wrote:
 From: Alex Deucher alexander.deuc...@amd.com

 It shouldn't be needed since the FLUSH_AND_INV_EVENT has already
 made sure the destination caches are flushed.  Additionally,
 we didn't previously emit the surface_sync until this commit:
 http://cgit.freedesktop.org/mesa/mesa/commit/?id=e5e4c07e7964a3258ed02b530bcdc24c0650204b
 Emitting them together causes hangs in compute on cayman/TN
 and hangs in Heaven on evergreen.

 Note: this patch is a candidate for the 9.1 branch, but requires:
 http://cgit.freedesktop.org/mesa/mesa/commit/?id=156bcca62c9f4e79e78929f72bc085757f36a65a
 as well.

 Signed-off-by: Alex Deucher alexander.deuc...@amd.com
 ---
  src/gallium/drivers/r600/r600_hw_context.c |   26 --
  1 files changed, 0 insertions(+), 26 deletions(-)

 diff --git a/src/gallium/drivers/r600/r600_hw_context.c 
 b/src/gallium/drivers/r600/r600_hw_context.c
 index 6d8b2cf..944b666 100644
 --- a/src/gallium/drivers/r600/r600_hw_context.c
 +++ b/src/gallium/drivers/r600/r600_hw_context.c
 @@ -226,32 +226,6 @@ void r600_flush_emit(struct r600_context *rctx)
 if (rctx-flags  R600_CONTEXT_FLUSH_AND_INV) {
 cs-buf[cs-cdw++] = PKT3(PKT3_EVENT_WRITE, 0, 0);
 cs-buf[cs-cdw++] = 
 EVENT_TYPE(EVENT_TYPE_CACHE_FLUSH_AND_INV_EVENT) | EVENT_INDEX(0);
 -   if (rctx-chip_class = EVERGREEN) {
 -   /* We were previously setting the CB and DB bits on
 -* cp_coher_cntl, but this is unnecessary since
 -* we are emitting the
 -* EVENT_TYPE_CACHE_FLUSH_AND_INV_EVENT packet.
 -* Setting the CB bits was causing lockups when using
 -* compute on cayman.
 -*
 -* XXX: Do even need to emit a surface sync packet 
 here?
 -* Prior to e5e4c07e7964a3258ed02b530bcdc24c0650204b
 -* surface sync was not being emitted with the
 -* R600_CONTEXT_FLUSH_AND_INV flag.
 -*/
 -   cp_coher_cntl = S_0085F0_TC_ACTION_ENA(1) |
 -   S_0085F0_DB_ACTION_ENA(1) |
 -   S_0085F0_SH_ACTION_ENA(1) |
 -   S_0085F0_SMX_ACTION_ENA(1) |
 -   S_0085F0_FULL_CACHE_ENA(1);
 -   } else {
 -   cp_coher_cntl = S_0085F0_SMX_ACTION_ENA(1) |
 -   S_0085F0_SH_ACTION_ENA(1) |
 -   S_0085F0_VC_ACTION_ENA(1) |
 -   S_0085F0_TC_ACTION_ENA(1) |
 -   S_0085F0_FULL_CACHE_ENA(1);
 -   }
 -   emit_flush = 1;
 }

 if (rctx-flags  R600_CONTEXT_INVAL_READ_CACHES) {
 --
 1.7.7.5

 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev
 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] R600 Patchset: Emit true ISA

2013-05-04 Thread Aaron Watry
This series, and the associated mesa changes are all:
Tested-By: Aaron Watry awa...@gmail.com

--Aaron

On Fri, May 3, 2013 at 5:53 PM, Tom Stellard t...@stellard.net wrote:
 Hi,

 The attached patches modify the CodeEmitter to emit true ISA.
 Previously, we were prefixing all instructions with an instruction type
 byte.

 Vincent did most of the work to convert the CodeEmitter to true ISA,
 these patches are just the last few cleanups that are needed to finish
 the project.

 Please test/review.

 Thanks,
 Tom

 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] R600: Expand vselect and SRA for v2i32 and v4i32

2013-05-06 Thread Aaron Watry
These two patches fix a number of piglit OpenCL test failures on my
HD6850 (Barts).

There are no piglit CL test regressions and the llvm make check runs
without any unexpected failures.

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/2] R600: Expand vselect for v4i32 and v2i32

2013-05-06 Thread Aaron Watry
Signed-off-by: Aaron Watry awa...@gmail.com
---
 lib/Target/R600/R600ISelLowering.cpp |3 +++
 1 file changed, 3 insertions(+)

diff --git a/lib/Target/R600/R600ISelLowering.cpp 
b/lib/Target/R600/R600ISelLowering.cpp
index c6e2136..6dec4d1 100644
--- a/lib/Target/R600/R600ISelLowering.cpp
+++ b/lib/Target/R600/R600ISelLowering.cpp
@@ -78,6 +78,9 @@ R600TargetLowering::R600TargetLowering(TargetMachine TM) :
   setOperationAction(ISD::SELECT, MVT::i32, Custom);
   setOperationAction(ISD::SELECT, MVT::f32, Custom);
 
+  setOperationAction(ISD::VSELECT, MVT::v4i32, Expand);
+  setOperationAction(ISD::VSELECT, MVT::v2i32, Expand);
+
   // Legalize loads and stores to the private address space.
   setOperationAction(ISD::LOAD, MVT::i32, Custom);
   setOperationAction(ISD::LOAD, MVT::v2i32, Custom);
-- 
1.7.10.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/2] R600: Expand SRA for v4i32/v2i32

2013-05-06 Thread Aaron Watry
Signed-off-by: Aaron Watry awa...@gmail.com
---
 lib/Target/R600/R600ISelLowering.cpp |2 ++
 1 file changed, 2 insertions(+)

diff --git a/lib/Target/R600/R600ISelLowering.cpp 
b/lib/Target/R600/R600ISelLowering.cpp
index 6dec4d1..ac56ed8 100644
--- a/lib/Target/R600/R600ISelLowering.cpp
+++ b/lib/Target/R600/R600ISelLowering.cpp
@@ -50,6 +50,8 @@ R600TargetLowering::R600TargetLowering(TargetMachine TM) :
   setOperationAction(ISD::SHL, MVT::v2i32, Expand);
   setOperationAction(ISD::SRL, MVT::v4i32, Expand);
   setOperationAction(ISD::SRL, MVT::v2i32, Expand);
+  setOperationAction(ISD::SRA, MVT::v4i32, Expand);
+  setOperationAction(ISD::SRA, MVT::v2i32, Expand);
   setOperationAction(ISD::UINT_TO_FP, MVT::v4i32, Expand);
   setOperationAction(ISD::UDIV, MVT::v4i32, Expand);
   setOperationAction(ISD::UREM, MVT::v4i32, Expand);
-- 
1.7.10.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] R600: Expand vselect and SRA for v2i32 and v4i32 (v2)

2013-05-08 Thread Aaron Watry
These two patches fix a number of piglit OpenCL test failures on my
HD6850 (Barts).

There are no piglit CL test regressions and the llvm make check runs
without any unexpected failures.

v2: Add tests for v4i32 data type.

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/2] R600: Expand vselect for v4i32 and v2i32

2013-05-08 Thread Aaron Watry
Signed-off-by: Aaron Watry awa...@gmail.com

v2: Add vselect v4i32 test
---
 lib/Target/R600/R600ISelLowering.cpp |3 +++
 test/CodeGen/R600/vselect.ll |   17 +
 2 files changed, 20 insertions(+)
 create mode 100644 test/CodeGen/R600/vselect.ll

diff --git a/lib/Target/R600/R600ISelLowering.cpp 
b/lib/Target/R600/R600ISelLowering.cpp
index c6e2136..6dec4d1 100644
--- a/lib/Target/R600/R600ISelLowering.cpp
+++ b/lib/Target/R600/R600ISelLowering.cpp
@@ -78,6 +78,9 @@ R600TargetLowering::R600TargetLowering(TargetMachine TM) :
   setOperationAction(ISD::SELECT, MVT::i32, Custom);
   setOperationAction(ISD::SELECT, MVT::f32, Custom);
 
+  setOperationAction(ISD::VSELECT, MVT::v4i32, Expand);
+  setOperationAction(ISD::VSELECT, MVT::v2i32, Expand);
+
   // Legalize loads and stores to the private address space.
   setOperationAction(ISD::LOAD, MVT::i32, Custom);
   setOperationAction(ISD::LOAD, MVT::v2i32, Custom);
diff --git a/test/CodeGen/R600/vselect.ll b/test/CodeGen/R600/vselect.ll
new file mode 100644
index 000..6e459df
--- /dev/null
+++ b/test/CodeGen/R600/vselect.ll
@@ -0,0 +1,17 @@
+;RUN: llc  %s -march=r600 -mcpu=redwood | FileCheck %s
+
+; CHECK: @test_select_v4i32
+; CHECK: CNDE_INT T{{[0-9]+\.[XYZW], PV\.[xyzw], T[0-9]+\.[XYZW], 
T[0-9]+\.[XYZW]}}
+; CHECK: CNDE_INT * T{{[0-9]+\.[XYZW], PV\.[xyzw], T[0-9]+\.[XYZW], 
T[0-9]+\.[XYZW]}}
+; CHECK: CNDE_INT * T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW], 
T[0-9]+\.[XYZW]}}
+; CHECK: CNDE_INT * T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW], 
T[0-9]+\.[XYZW]}}
+
+define void @test_select_v4i32(4 x i32 addrspace(1)* %out, 4 x i32 
addrspace(1)* %in0, 4 x i32 addrspace(1)* %in1) {
+entry:
+  %0 = load 4 x i32 addrspace(1)* %in0
+  %1 = load 4 x i32 addrspace(1)* %in1
+  %cmp = icmp ne 4 x i32 %0, %1
+  %result = select 4 x i1 %cmp, 4 x i32 %0, 4 x i32 %1
+  store 4 x i32 %result, 4 x i32 addrspace(1)* %out
+  ret void
+}
-- 
1.7.10.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/2] R600: Expand SRA for v4i32/v2i32

2013-05-08 Thread Aaron Watry
Signed-off-by: Aaron Watry awa...@gmail.com

v2: Add v4i32 test
---
 lib/Target/R600/R600ISelLowering.cpp |2 ++
 test/CodeGen/R600/sra.ll |   13 +
 2 files changed, 15 insertions(+)
 create mode 100644 test/CodeGen/R600/sra.ll

diff --git a/lib/Target/R600/R600ISelLowering.cpp 
b/lib/Target/R600/R600ISelLowering.cpp
index 6dec4d1..ac56ed8 100644
--- a/lib/Target/R600/R600ISelLowering.cpp
+++ b/lib/Target/R600/R600ISelLowering.cpp
@@ -50,6 +50,8 @@ R600TargetLowering::R600TargetLowering(TargetMachine TM) :
   setOperationAction(ISD::SHL, MVT::v2i32, Expand);
   setOperationAction(ISD::SRL, MVT::v4i32, Expand);
   setOperationAction(ISD::SRL, MVT::v2i32, Expand);
+  setOperationAction(ISD::SRA, MVT::v4i32, Expand);
+  setOperationAction(ISD::SRA, MVT::v2i32, Expand);
   setOperationAction(ISD::UINT_TO_FP, MVT::v4i32, Expand);
   setOperationAction(ISD::UDIV, MVT::v4i32, Expand);
   setOperationAction(ISD::UREM, MVT::v4i32, Expand);
diff --git a/test/CodeGen/R600/sra.ll b/test/CodeGen/R600/sra.ll
new file mode 100644
index 000..972542d
--- /dev/null
+++ b/test/CodeGen/R600/sra.ll
@@ -0,0 +1,13 @@
+; RUN: llc  %s -march=r600 -mcpu=redwood | FileCheck %s
+
+; CHECK: @ashr_v4i32
+; CHECK: ASHR * T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
+; CHECK: ASHR * T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
+; CHECK: ASHR * T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
+; CHECK: ASHR * T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
+
+define void @ashr_v4i32(4 x i32 addrspace(1)* %out, 4 x i32 %a, 4 x i32 
%b) {
+  %result = ashr 4 x i32 %a, %b
+  store 4 x i32 %result, 4 x i32 addrspace(1)* %out
+  ret void
+}
-- 
1.7.10.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] R600: Expand MUL for v4i32/v2i32

2013-05-08 Thread Aaron Watry
Fixes piglit test for OpenCL builtin mul24, and allows mad24 to run.

Signed-off-by: Aaron Watry awa...@gmail.com
---
 lib/Target/R600/R600ISelLowering.cpp |2 ++
 test/CodeGen/R600/mul.ll |   16 
 2 files changed, 18 insertions(+)
 create mode 100644 test/CodeGen/R600/mul.ll

diff --git a/lib/Target/R600/R600ISelLowering.cpp 
b/lib/Target/R600/R600ISelLowering.cpp
index ac56ed8..b982279 100644
--- a/lib/Target/R600/R600ISelLowering.cpp
+++ b/lib/Target/R600/R600ISelLowering.cpp
@@ -43,6 +43,8 @@ R600TargetLowering::R600TargetLowering(TargetMachine TM) :
   setOperationAction(ISD::AND,  MVT::v4i32, Expand);
   setOperationAction(ISD::FP_TO_SINT, MVT::v4i32, Expand);
   setOperationAction(ISD::FP_TO_UINT, MVT::v4i32, Expand);
+  setOperationAction(ISD::MUL,  MVT::v2i32, Expand);
+  setOperationAction(ISD::MUL,  MVT::v4i32, Expand);
   setOperationAction(ISD::OR, MVT::v4i32, Expand);
   setOperationAction(ISD::OR, MVT::v2i32, Expand);
   setOperationAction(ISD::SINT_TO_FP, MVT::v4i32, Expand);
diff --git a/test/CodeGen/R600/mul.ll b/test/CodeGen/R600/mul.ll
new file mode 100644
index 000..7278e90
--- /dev/null
+++ b/test/CodeGen/R600/mul.ll
@@ -0,0 +1,16 @@
+;RUN: llc  %s -march=r600 -mcpu=redwood | FileCheck %s
+
+; mul24 and mad24 are affected
+;CHECK: MULLO_INT * T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
+;CHECK: MULLO_INT * T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
+;CHECK: MULLO_INT * T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
+;CHECK: MULLO_INT * T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
+
+define void @test(4 x i32 addrspace(1)* %out, 4 x i32 addrspace(1)* %in) {
+  %b_ptr = getelementptr 4 x i32 addrspace(1)* %in, i32 1
+  %a = load 4 x i32 addrspace(1) * %in
+  %b = load 4 x i32 addrspace(1) * %b_ptr
+  %result = mul 4 x i32 %a, %b
+  store 4 x i32 %result, 4 x i32 addrspace(1)* %out
+  ret void
+}
-- 
1.7.10.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] R600: Expand SUB for v2i32/v4i32

2013-05-08 Thread Aaron Watry
Signed-off-by: Aaron Watry awa...@gmail.com
---
 lib/Target/R600/R600ISelLowering.cpp |2 ++
 test/CodeGen/R600/sub.ll |   15 +++
 2 files changed, 17 insertions(+)
 create mode 100644 test/CodeGen/R600/sub.ll

diff --git a/lib/Target/R600/R600ISelLowering.cpp 
b/lib/Target/R600/R600ISelLowering.cpp
index b982279..7252235 100644
--- a/lib/Target/R600/R600ISelLowering.cpp
+++ b/lib/Target/R600/R600ISelLowering.cpp
@@ -54,6 +54,8 @@ R600TargetLowering::R600TargetLowering(TargetMachine TM) :
   setOperationAction(ISD::SRL, MVT::v2i32, Expand);
   setOperationAction(ISD::SRA, MVT::v4i32, Expand);
   setOperationAction(ISD::SRA, MVT::v2i32, Expand);
+  setOperationAction(ISD::SUB, MVT::v4i32, Expand);
+  setOperationAction(ISD::SUB, MVT::v2i32, Expand);
   setOperationAction(ISD::UINT_TO_FP, MVT::v4i32, Expand);
   setOperationAction(ISD::UDIV, MVT::v4i32, Expand);
   setOperationAction(ISD::UREM, MVT::v4i32, Expand);
diff --git a/test/CodeGen/R600/sub.ll b/test/CodeGen/R600/sub.ll
new file mode 100644
index 000..12bfba3
--- /dev/null
+++ b/test/CodeGen/R600/sub.ll
@@ -0,0 +1,15 @@
+;RUN: llc  %s -march=r600 -mcpu=redwood | FileCheck %s
+
+;CHECK: SUB_INT T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
+;CHECK: SUB_INT * T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
+;CHECK: SUB_INT * T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
+;CHECK: SUB_INT * T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
+
+define void @test(4 x i32 addrspace(1)* %out, 4 x i32 addrspace(1)* %in) {
+  %b_ptr = getelementptr 4 x i32 addrspace(1)* %in, i32 1
+  %a = load 4 x i32 addrspace(1) * %in
+  %b = load 4 x i32 addrspace(1) * %b_ptr
+  %result = sub 4 x i32 %a, %b
+  store 4 x i32 %result, 4 x i32 addrspace(1)* %out
+  ret void
+}
-- 
1.7.10.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] scons: Use LLVM shared library if found.

2013-05-17 Thread Aaron Watry
On Fri, May 17, 2013 at 2:31 PM, Jose Fonseca jfons...@vmware.com wrote:


 - Original Message -
 On Fri, May 17, 2013 at 7:44 AM, Jose Fonseca jfons...@vmware.com wrote:
  Vinson,
 
  Why is this necessary?
 
  (I'd prefer that LLVM is statically linked by default. )
 
  Jose

 The SCons build fails on systems that only provide a LLVM shared
 library. 'llvm-config --libs' always enumerates the libraries instead
 of providing the shared library on such systems.

 $ cat /etc/redhat-release
 Fedora release 18 (Spherical Cow)
 $ llvm-config --libs
 -lLLVMAsmParser -lLLVMInstrumentation -lLLVMLinker -lLLVMArchive
 -lLLVMBitReader -lLLVMDebugInfo -lLLVMJIT -lLLVMipo -lLLVMVectorize
 -lLLVMBitWriter -lLLVMTableGen -lLLVMHexagonCodeGen -lLLVMHexagonDesc
 -lLLVMHexagonAsmPrinter -lLLVMHexagonInfo -lLLVMNVPTXCodeGen
 -lLLVMNVPTXDesc -lLLVMNVPTXInfo -lLLVMNVPTXAsmPrinter
 -lLLVMMBlazeDisassembler -lLLVMMBlazeAsmParser -lLLVMMBlazeCodeGen
 -lLLVMMBlazeDesc -lLLVMMBlazeInfo -lLLVMMBlazeAsmPrinter
 -lLLVMCppBackendCodeGen -lLLVMCppBackendInfo -lLLVMMSP430CodeGen
 -lLLVMMSP430Desc -lLLVMMSP430Info -lLLVMMSP430AsmPrinter
 -lLLVMXCoreCodeGen -lLLVMXCoreDesc -lLLVMXCoreInfo
 -lLLVMCellSPUCodeGen -lLLVMCellSPUDesc -lLLVMCellSPUInfo
 -lLLVMMipsDisassembler -lLLVMMipsAsmParser -lLLVMMipsCodeGen
 -lLLVMMipsDesc -lLLVMMipsInfo -lLLVMMipsAsmPrinter
 -lLLVMARMDisassembler -lLLVMARMAsmParser -lLLVMARMCodeGen
 -lLLVMARMDesc -lLLVMARMInfo -lLLVMARMAsmPrinter -lLLVMPowerPCCodeGen
 -lLLVMPowerPCDesc -lLLVMPowerPCInfo -lLLVMPowerPCAsmPrinter
 -lLLVMSparcCodeGen -lLLVMSparcDesc -lLLVMSparcInfo -lLLVMX86AsmParser
 -lLLVMX86Disassembler -lLLVMX86CodeGen -lLLVMX86Desc -lLLVMX86Info
 -lLLVMX86AsmPrinter -lLLVMX86Utils -lLLVMR600CodeGen
 -lLLVMSelectionDAG -lLLVMAsmPrinter -lLLVMR600Desc -lLLVMR600Info
 -lLLVMR600AsmPrinter -lLLVMMCDisassembler -lLLVMMCParser
 -lLLVMInterpreter -lLLVMCodeGen -lLLVMScalarOpts -lLLVMInstCombine
 -lLLVMTransformUtils -lLLVMipa -lLLVMAnalysis -lLLVMMCJIT
 -lLLVMRuntimeDyld -lLLVMExecutionEngine -lLLVMTarget -lLLVMMC
 -lLLVMObject -lLLVMCore -lLLVMSupport
 $ ls `llvm-config --libdir`
 BugpointPasses.so  libclang.so  libLLVM-3.2svn.so  libLTO.so
 libprofile_rt.so  LLVMgold.so

 Then Fedora 18's llvm-config is busted, as `llvm-config --libs` should return 
 libLLVM-3.2svn.so


I'm using upstream llvm git master, and the shared library isn't
listed in llvm-config here either.


~/src/llvm$ llvm-config --libs
-lLLVMR600CodeGen -lLLVMR600Desc -lLLVMR600Info -lLLVMR600AsmPrinter
-lLLVMTableGen -lLLVMDebugInfo -lLLVMOption -lLLVMX86Disassembler
-lLLVMX86AsmParser -lLLVMX86CodeGen -lLLVMSelectionDAG
-lLLVMAsmPrinter -lLLVMX86Desc -lLLVMX86Info -lLLVMX86AsmPrinter
-lLLVMX86Utils -lLLVMIRReader -lLLVMAsmParser -lLLVMMCDisassembler
-lLLVMMCParser -lLLVMInstrumentation -lLLVMArchive -lLLVMBitReader
-lLLVMInterpreter -lLLVMipo -lLLVMVectorize -lLLVMLinker
-lLLVMBitWriter -lLLVMMCJIT -lLLVMJIT -lLLVMCodeGen -lLLVMObjCARCOpts
-lLLVMScalarOpts -lLLVMInstCombine -lLLVMTransformUtils -lLLVMipa
-lLLVMAnalysis -lLLVMRuntimeDyld -lLLVMExecutionEngine -lLLVMTarget
-lLLVMMC -lLLVMObject -lLLVMCore -lLLVMSupport

~/src/llvm$ which llvm-config
/usr/local/bin/llvm-config

~/src/llvm$ ls /usr/local/lib/libLLVM*
/usr/local/lib/libLLVM-3.4svn.so
/usr/local/lib/libLLVMAnalysis.a
/usr/local/lib/libLLVMArchive.a
/usr/local/lib/libLLVMAsmParser.a
snip

I'm guessing that 'llvm-config --libs' is only listing static libraries

--Aaron


 So I believe this issue should be filled against Fedora, not worked around 
 here.   Honestly, only shipping LLVM in a .so is already a bad idea, but 
 breaking llvm-config is even worse -- what the point of scripts like 
 llvm-config if their output can't be trusted?

 BTW, configure.ac doesn't have this hack.  Does it fail the same way too?

 Jose
 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH libclc] Add bitselect builtin

2013-05-23 Thread Aaron Watry
Reviewed-by: Aaron Watry awa...@gmail.com

Please also send the attached test patch (or an expanded version of
it) to the piglit list.

On Thu, May 23, 2013 at 12:48 PM, Tom Stellard t...@stellard.net wrote:
 From: Tom Stellard thomas.stell...@amd.com

 ---
  generic/include/clc/clc.h  | 1 +
  generic/include/clc/relational/bitselect.h | 1 +
  2 files changed, 2 insertions(+)
  create mode 100644 generic/include/clc/relational/bitselect.h

 diff --git a/generic/include/clc/clc.h b/generic/include/clc/clc.h
 index d2858a8..b53a217 100644
 --- a/generic/include/clc/clc.h
 +++ b/generic/include/clc/clc.h
 @@ -80,6 +80,7 @@

  /* 6.11.6 Relational Functions */
  #include clc/relational/any.h
 +#include clc/relational/bitselect.h
  #include clc/relational/select.h

  /* 6.11.8 Synchronization Functions */
 diff --git a/generic/include/clc/relational/bitselect.h 
 b/generic/include/clc/relational/bitselect.h
 new file mode 100644
 index 000..e91cbfd
 --- /dev/null
 +++ b/generic/include/clc/relational/bitselect.h
 @@ -0,0 +1 @@
 +#define bitselect(x, y, z) ((x) ^ ((z)  ((y) ^ (x
 --
 1.8.1.5

 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev


0001-CL-Basic-test-of-bitselect-builtin.patch
Description: Binary data
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] libclc: vload/vstore initial implementation

2013-05-23 Thread Aaron Watry
I've implemented the OpenCL vload/vstore builtin functions in two parts.
1) Pure CL C implementation. No Assembly
2) Add assembly optimizations for 32-bit int/uint loads/stores of 4+ component
   vectors

Note: The vstore implementation assumes that the hardware back end supports
byte-addressable stores.  This may not always be optimal.

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/4] libclc: Initial vload implementation

2013-05-23 Thread Aaron Watry
Should work for all targets and data types.  Completely unoptimized.
---
 generic/include/clc/clc.h  |  1 +
 generic/include/clc/shared/vload.h | 37 ++
 generic/lib/SOURCES|  1 +
 generic/lib/shared/vload.cl| 47 ++
 4 files changed, 86 insertions(+)
 create mode 100644 generic/include/clc/shared/vload.h
 create mode 100644 generic/lib/shared/vload.cl

diff --git a/generic/include/clc/clc.h b/generic/include/clc/clc.h
index d2858a8..7937003 100644
--- a/generic/include/clc/clc.h
+++ b/generic/include/clc/clc.h
@@ -71,6 +71,7 @@
 #include clc/shared/clamp.h
 #include clc/shared/max.h
 #include clc/shared/min.h
+#include clc/shared/vload.h
 
 /* 6.11.5 Geometric Functions */
 #include clc/geometric/cross.h
diff --git a/generic/include/clc/shared/vload.h 
b/generic/include/clc/shared/vload.h
new file mode 100644
index 000..93d0750
--- /dev/null
+++ b/generic/include/clc/shared/vload.h
@@ -0,0 +1,37 @@
+#define _CLC_VLOAD_DECL(PRIM_TYPE, VEC_TYPE, WIDTH, ADDR_SPACE) \
+  _CLC_OVERLOAD _CLC_DECL VEC_TYPE vload##WIDTH(size_t offset, const 
ADDR_SPACE PRIM_TYPE *x);
+
+#define _CLC_VECTOR_VLOAD_DECL(PRIM_TYPE, ADDR_SPACE) \
+  _CLC_VLOAD_DECL(PRIM_TYPE, PRIM_TYPE##2, 2, ADDR_SPACE) \
+  _CLC_VLOAD_DECL(PRIM_TYPE, PRIM_TYPE##3, 3, ADDR_SPACE) \
+  _CLC_VLOAD_DECL(PRIM_TYPE, PRIM_TYPE##4, 4, ADDR_SPACE) \
+  _CLC_VLOAD_DECL(PRIM_TYPE, PRIM_TYPE##8, 8, ADDR_SPACE) \
+  _CLC_VLOAD_DECL(PRIM_TYPE, PRIM_TYPE##16, 16, ADDR_SPACE)
+
+#define _CLC_VECTOR_VLOAD_PRIM1(PRIM_TYPE) \
+  _CLC_VECTOR_VLOAD_DECL(PRIM_TYPE, __private) \
+  _CLC_VECTOR_VLOAD_DECL(PRIM_TYPE, __local) \
+  _CLC_VECTOR_VLOAD_DECL(PRIM_TYPE, __constant) \
+  _CLC_VECTOR_VLOAD_DECL(PRIM_TYPE, __global) \
+
+#define _CLC_VECTOR_VLOAD_PRIM() \
+_CLC_VECTOR_VLOAD_PRIM1(char) \
+_CLC_VECTOR_VLOAD_PRIM1(uchar) \
+_CLC_VECTOR_VLOAD_PRIM1(short) \
+_CLC_VECTOR_VLOAD_PRIM1(ushort) \
+_CLC_VECTOR_VLOAD_PRIM1(int) \
+_CLC_VECTOR_VLOAD_PRIM1(uint) \
+_CLC_VECTOR_VLOAD_PRIM1(long) \
+_CLC_VECTOR_VLOAD_PRIM1(ulong) \
+_CLC_VECTOR_VLOAD_PRIM1(float) \
+
+#ifdef cl_khr_fp64
+#define _CLC_VECTOR_VLOAD() \
+  _CLC_VECTOR_VLOAD_PRIM1(double) \
+  _CLC_VECTOR_VLOAD_PRIM()
+#else
+#define _CLC_VECTOR_VLOAD() \
+  _CLC_VECTOR_VLOAD_PRIM()
+#endif
+
+_CLC_VECTOR_VLOAD()
diff --git a/generic/lib/SOURCES b/generic/lib/SOURCES
index 59eb9bb..5d9e3fa 100644
--- a/generic/lib/SOURCES
+++ b/generic/lib/SOURCES
@@ -23,5 +23,6 @@ relational/any.cl
 shared/clamp.cl
 shared/max.cl
 shared/min.cl
+shared/vload.cl
 workitem/get_global_id.cl
 workitem/get_global_size.cl
diff --git a/generic/lib/shared/vload.cl b/generic/lib/shared/vload.cl
new file mode 100644
index 000..24d8240
--- /dev/null
+++ b/generic/lib/shared/vload.cl
@@ -0,0 +1,47 @@
+#include clc/clc.h
+
+#define VLOAD_VECTORIZE(PRIM_TYPE, ADDR_SPACE) \
+  _CLC_OVERLOAD _CLC_DEF PRIM_TYPE##2 vload2(size_t offset, const ADDR_SPACE 
PRIM_TYPE *x) { \
+return (PRIM_TYPE##2)(x[offset] , x[offset+1]); \
+  } \
+\
+  _CLC_OVERLOAD _CLC_DEF PRIM_TYPE##3 vload3(size_t offset, const ADDR_SPACE 
PRIM_TYPE *x) { \
+return (PRIM_TYPE##3)(x[offset] , x[offset+1], x[offset+2]); \
+  } \
+\
+  _CLC_OVERLOAD _CLC_DEF PRIM_TYPE##4 vload4(size_t offset, const ADDR_SPACE 
PRIM_TYPE *x) { \
+return (PRIM_TYPE##4)(x[offset], x[offset+1], x[offset+2], x[offset+3]); \
+  } \
+\
+  _CLC_OVERLOAD _CLC_DEF PRIM_TYPE##8 vload8(size_t offset, const ADDR_SPACE 
PRIM_TYPE *x) { \
+return (PRIM_TYPE##8)(vload4(offset, x), vload4(offset+4, x)); \
+  } \
+\
+  _CLC_OVERLOAD _CLC_DEF PRIM_TYPE##16 vload16(size_t offset, const ADDR_SPACE 
PRIM_TYPE *x) { \
+return (PRIM_TYPE##16)(vload8(offset, x), vload8(offset+8, x)); \
+  } \
+
+#define VLOAD_ADDR_SPACES(SCALAR_GENTYPE) \
+VLOAD_VECTORIZE(SCALAR_GENTYPE, __private) \
+VLOAD_VECTORIZE(SCALAR_GENTYPE, __local) \
+VLOAD_VECTORIZE(SCALAR_GENTYPE, __constant) \
+VLOAD_VECTORIZE(SCALAR_GENTYPE, __global) \
+
+#define VLOAD_TYPES() \
+VLOAD_ADDR_SPACES(char) \
+VLOAD_ADDR_SPACES(uchar) \
+VLOAD_ADDR_SPACES(short) \
+VLOAD_ADDR_SPACES(ushort) \
+VLOAD_ADDR_SPACES(int) \
+VLOAD_ADDR_SPACES(uint) \
+VLOAD_ADDR_SPACES(long) \
+VLOAD_ADDR_SPACES(ulong) \
+VLOAD_ADDR_SPACES(float) \
+
+VLOAD_TYPES()
+
+#ifdef cl_khr_fp64
+#pragma OPENCL EXTENSION cl_khr_fp64 : enable
+VLOAD_ADDR_SPACES(double)
+#endif
+
-- 
1.8.1.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/4] libclc: Initial vstore implementation

2013-05-23 Thread Aaron Watry
Assumes that the target supports byte-addressable stores.

Completely unoptimized.
---
 generic/include/clc/clc.h   |  1 +
 generic/include/clc/shared/vstore.h | 36 
 generic/lib/SOURCES |  1 +
 generic/lib/shared/vstore.cl| 56 +
 4 files changed, 94 insertions(+)
 create mode 100644 generic/include/clc/shared/vstore.h
 create mode 100644 generic/lib/shared/vstore.cl

diff --git a/generic/include/clc/clc.h b/generic/include/clc/clc.h
index 7937003..10d30e0 100644
--- a/generic/include/clc/clc.h
+++ b/generic/include/clc/clc.h
@@ -72,6 +72,7 @@
 #include clc/shared/max.h
 #include clc/shared/min.h
 #include clc/shared/vload.h
+#include clc/shared/vstore.h
 
 /* 6.11.5 Geometric Functions */
 #include clc/geometric/cross.h
diff --git a/generic/include/clc/shared/vstore.h 
b/generic/include/clc/shared/vstore.h
new file mode 100644
index 000..1f784f8
--- /dev/null
+++ b/generic/include/clc/shared/vstore.h
@@ -0,0 +1,36 @@
+#define _CLC_VSTORE_DECL(PRIM_TYPE, VEC_TYPE, WIDTH, ADDR_SPACE) \
+  _CLC_OVERLOAD _CLC_DECL void vstore##WIDTH(VEC_TYPE vec, size_t offset, 
ADDR_SPACE PRIM_TYPE *out);
+
+#define _CLC_VECTOR_VSTORE_DECL(PRIM_TYPE, ADDR_SPACE) \
+  _CLC_VSTORE_DECL(PRIM_TYPE, PRIM_TYPE##2, 2, ADDR_SPACE) \
+  _CLC_VSTORE_DECL(PRIM_TYPE, PRIM_TYPE##3, 3, ADDR_SPACE) \
+  _CLC_VSTORE_DECL(PRIM_TYPE, PRIM_TYPE##4, 4, ADDR_SPACE) \
+  _CLC_VSTORE_DECL(PRIM_TYPE, PRIM_TYPE##8, 8, ADDR_SPACE) \
+  _CLC_VSTORE_DECL(PRIM_TYPE, PRIM_TYPE##16, 16, ADDR_SPACE)
+
+#define _CLC_VECTOR_VSTORE_PRIM1(PRIM_TYPE) \
+  _CLC_VECTOR_VSTORE_DECL(PRIM_TYPE, __private) \
+  _CLC_VECTOR_VSTORE_DECL(PRIM_TYPE, __local) \
+  _CLC_VECTOR_VSTORE_DECL(PRIM_TYPE, __global) \
+
+#define _CLC_VECTOR_VSTORE_PRIM() \
+_CLC_VECTOR_VSTORE_PRIM1(char) \
+_CLC_VECTOR_VSTORE_PRIM1(uchar) \
+_CLC_VECTOR_VSTORE_PRIM1(short) \
+_CLC_VECTOR_VSTORE_PRIM1(ushort) \
+_CLC_VECTOR_VSTORE_PRIM1(int) \
+_CLC_VECTOR_VSTORE_PRIM1(uint) \
+_CLC_VECTOR_VSTORE_PRIM1(long) \
+_CLC_VECTOR_VSTORE_PRIM1(ulong) \
+_CLC_VECTOR_VSTORE_PRIM1(float) \
+
+#ifdef cl_khr_fp64
+#define _CLC_VECTOR_VSTORE() \
+  _CLC_VECTOR_VSTORE_PRIM1(double) \
+  _CLC_VECTOR_VSTORE_PRIM()
+#else
+#define _CLC_VECTOR_VSTORE() \
+  _CLC_VECTOR_VSTORE_PRIM()
+#endif
+
+_CLC_VECTOR_VSTORE()
diff --git a/generic/lib/SOURCES b/generic/lib/SOURCES
index 5d9e3fa..50cc9bd 100644
--- a/generic/lib/SOURCES
+++ b/generic/lib/SOURCES
@@ -24,5 +24,6 @@ shared/clamp.cl
 shared/max.cl
 shared/min.cl
 shared/vload.cl
+shared/vstore.cl
 workitem/get_global_id.cl
 workitem/get_global_size.cl
diff --git a/generic/lib/shared/vstore.cl b/generic/lib/shared/vstore.cl
new file mode 100644
index 000..e88ccc5
--- /dev/null
+++ b/generic/lib/shared/vstore.cl
@@ -0,0 +1,56 @@
+#include clc/clc.h
+
+#pragma OPENCL EXTENSION cl_khr_byte_addressable_store : enable
+
+#define VSTORE_VECTORIZE(PRIM_TYPE, ADDR_SPACE) \
+  _CLC_OVERLOAD _CLC_DEF void vstore2(PRIM_TYPE##2 vec, size_t offset, 
ADDR_SPACE PRIM_TYPE *mem) { \
+mem[offset] = vec.s0; \
+mem[offset+1] = vec.s1; \
+  } \
+\
+  _CLC_OVERLOAD _CLC_DEF void vstore3(PRIM_TYPE##3 vec, size_t offset, 
ADDR_SPACE PRIM_TYPE *mem) { \
+mem[offset] = vec.s0; \
+mem[offset+1] = vec.s1; \
+mem[offset+2] = vec.s2; \
+  } \
+\
+  _CLC_OVERLOAD _CLC_DEF void vstore4(PRIM_TYPE##4 vec, size_t offset, 
ADDR_SPACE PRIM_TYPE *mem) { \
+mem[offset] = vec.s0; \
+mem[offset+1] = vec.s1; \
+mem[offset+2] = vec.s2; \
+mem[offset+3] = vec.s3; \
+  } \
+\
+  _CLC_OVERLOAD _CLC_DEF void vstore8(PRIM_TYPE##8 vec, size_t offset, 
ADDR_SPACE PRIM_TYPE *mem) { \
+vstore4(vec.lo, offset, mem); \
+vstore4(vec.hi, offset+4, mem); \
+  } \
+\
+  _CLC_OVERLOAD _CLC_DEF void vstore16(PRIM_TYPE##16 vec, size_t offset, 
ADDR_SPACE PRIM_TYPE *mem) { \
+vstore8(vec.lo, offset, mem); \
+vstore8(vec.hi, offset+8, mem); \
+  } \
+
+#define VSTORE_ADDR_SPACES(SCALAR_GENTYPE) \
+VSTORE_VECTORIZE(SCALAR_GENTYPE, __private) \
+VSTORE_VECTORIZE(SCALAR_GENTYPE, __local) \
+VSTORE_VECTORIZE(SCALAR_GENTYPE, __global) \
+
+#define VSTORE_TYPES() \
+VSTORE_ADDR_SPACES(char) \
+VSTORE_ADDR_SPACES(uchar) \
+VSTORE_ADDR_SPACES(short) \
+VSTORE_ADDR_SPACES(ushort) \
+VSTORE_ADDR_SPACES(int) \
+VSTORE_ADDR_SPACES(uint) \
+VSTORE_ADDR_SPACES(long) \
+VSTORE_ADDR_SPACES(ulong) \
+VSTORE_ADDR_SPACES(float) \
+
+VSTORE_TYPES()
+
+#ifdef cl_khr_fp64
+#pragma OPENCL EXTENSION cl_khr_fp64 : enable
+VSTORE_ADDR_SPACES(double)
+#endif
+
-- 
1.8.1.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/4] libclc: Add assembly versions of vload for global int4/8/16

2013-05-23 Thread Aaron Watry
The assembly should be generic, but at least currently R600 only supports
32-bit loads of int1/4, and I believe that only global is well-supported.

R600 lowers the 8/16 component vectors to multiple 4-bit loads.

The unoptimized C versions of the other stuff is left in place.
---
 generic/lib/SOURCES  |  2 ++
 generic/lib/shared/vload.cl  | 53 +--
 generic/lib/shared/vload_if.ll   | 60 
 generic/lib/shared/vload_impl.ll | 49 
 4 files changed, 162 insertions(+), 2 deletions(-)
 create mode 100644 generic/lib/shared/vload_if.ll
 create mode 100644 generic/lib/shared/vload_impl.ll

diff --git a/generic/lib/SOURCES b/generic/lib/SOURCES
index 50cc9bd..9f6acf3 100644
--- a/generic/lib/SOURCES
+++ b/generic/lib/SOURCES
@@ -24,6 +24,8 @@ shared/clamp.cl
 shared/max.cl
 shared/min.cl
 shared/vload.cl
+shared/vload_if.ll
+shared/vload_impl.ll
 shared/vstore.cl
 workitem/get_global_id.cl
 workitem/get_global_size.cl
diff --git a/generic/lib/shared/vload.cl b/generic/lib/shared/vload.cl
index 24d8240..f6ebd37 100644
--- a/generic/lib/shared/vload.cl
+++ b/generic/lib/shared/vload.cl
@@ -27,13 +27,12 @@
 VLOAD_VECTORIZE(SCALAR_GENTYPE, __constant) \
 VLOAD_VECTORIZE(SCALAR_GENTYPE, __global) \
 
+//int/uint are special... see below
 #define VLOAD_TYPES() \
 VLOAD_ADDR_SPACES(char) \
 VLOAD_ADDR_SPACES(uchar) \
 VLOAD_ADDR_SPACES(short) \
 VLOAD_ADDR_SPACES(ushort) \
-VLOAD_ADDR_SPACES(int) \
-VLOAD_ADDR_SPACES(uint) \
 VLOAD_ADDR_SPACES(long) \
 VLOAD_ADDR_SPACES(ulong) \
 VLOAD_ADDR_SPACES(float) \
@@ -45,3 +44,53 @@ VLOAD_TYPES()
 VLOAD_ADDR_SPACES(double)
 #endif
 
+VLOAD_VECTORIZE(int, __private)
+VLOAD_VECTORIZE(int, __local)
+VLOAD_VECTORIZE(int, __constant)
+VLOAD_VECTORIZE(uint, __private)
+VLOAD_VECTORIZE(uint, __local)
+VLOAD_VECTORIZE(uint, __constant)
+
+_CLC_OVERLOAD _CLC_DEF int2 vload2(size_t offset, const global int *x) {
+  return (int2)(x[offset] , x[offset+1]);
+}
+_CLC_OVERLOAD _CLC_DEF int3 vload3(size_t offset, const global int *x) {
+  return (int3)(vload2(offset, x), x[offset+2]);
+}
+_CLC_OVERLOAD _CLC_DEF uint2 vload2(size_t offset, const global uint *x) {
+  return (uint2)(x[offset] , x[offset+1]);
+}
+_CLC_OVERLOAD _CLC_DEF uint3 vload3(size_t offset, const global uint *x) {
+  return (uint3)(vload2(offset, x), x[offset+2]);
+}
+
+/*Note: It is known that R600 doesn't support load 2 x ? and 3 x ?... so
+ * they aren't actually overridden here
+ */
+_CLC_DECL int4 __clc_vload4_int__global(size_t offset, const __global int *);
+_CLC_DECL int8 __clc_vload8_int__global(size_t offset, const __global int *);
+_CLC_DECL int16 __clc_vload16_int__global(size_t offset, const __global int *);
+
+_CLC_OVERLOAD _CLC_DEF int4 vload4(size_t offset, const global int *x) {
+  return __clc_vload4_int__global(offset, x);
+}
+_CLC_OVERLOAD _CLC_DEF int8 vload8(size_t offset, const global int *x) {
+  return __clc_vload8_int__global(offset, x);
+}
+_CLC_OVERLOAD _CLC_DEF int16 vload16(size_t offset, const global int *x) {
+  return __clc_vload16_int__global(offset, x);
+}
+
+_CLC_DECL uint4 __clc_vload4_uint__global(size_t offset, const __global uint 
*);
+_CLC_DECL uint8 __clc_vload8_uint__global(size_t offset, const __global uint 
*);
+_CLC_DECL uint16 __clc_vload16_uint__global(size_t offset, const __global uint 
*);
+
+_CLC_OVERLOAD _CLC_DEF uint4 vload4(size_t offset, const global uint *x) {
+  return __clc_vload4_uint__global(offset, x);
+}
+_CLC_OVERLOAD _CLC_DEF uint8 vload8(size_t offset, const global uint *x) {
+  return __clc_vload8_uint__global(offset, x);
+}
+_CLC_OVERLOAD _CLC_DEF uint16 vload16(size_t offset, const global uint *x) {
+  return __clc_vload16_uint__global(offset, x);
+}
\ No newline at end of file
diff --git a/generic/lib/shared/vload_if.ll b/generic/lib/shared/vload_if.ll
new file mode 100644
index 000..2634d37
--- /dev/null
+++ b/generic/lib/shared/vload_if.ll
@@ -0,0 +1,60 @@
+;Start int global vload
+
+declare 2 x i32 @__clc_vload2_impl_i32__global(i32 %x, i32 %y)
+declare 3 x i32 @__clc_vload3_impl_i32__global(i32 %x, i32 %y)
+declare 4 x i32 @__clc_vload4_impl_i32__global(i32 %x, i32 %y)
+declare 8 x i32 @__clc_vload8_impl_i32__global(i32 %x, i32 %y)
+declare 16 x i32 @__clc_vload16_impl_i32__global(i32 %x, i32 %y)
+
+define 2 x i32 @__clc_vload2_int__global(i32 %x, i32 %y) nounwind readonly 
alwaysinline {
+  %call = call 2 x i32 @__clc_vload2_impl_i32__global(i32 %x, i32 %y)
+  ret 2 x i32 %call
+}
+
+define 3 x i32 @__clc_vload3_int__global(i32 %x, i32 %y) nounwind readonly 
alwaysinline {
+  %call = call 3 x i32 @__clc_vload3_impl_i32__global(i32 %x, i32 %y)
+  ret 3 x i32 %call
+}
+
+define 4 x i32 @__clc_vload4_int__global(i32 %x, i32 %y) nounwind readonly 
alwaysinline {
+  %call = call 4 x i32 @__clc_vload4_impl_i32__global(i32 %x, i32 %y)
+  ret 4 x i32 %call
+}
+
+define 8 x i32 

[Mesa-dev] [PATCH 4/4] libclc: Add assembly versions of vstore for global [u]int4/8/16

2013-05-23 Thread Aaron Watry
The assembly should be generic, but at least currently R600 only supports
32-bit stores of [u]int1/4, and I believe that only global is well-supported.

R600 lowers the 8/16 component stores to multiple 4-component stores.

The unoptimized C versions of the other stuff is left in place.
---
 generic/lib/SOURCES   |  2 ++
 generic/lib/shared/vstore.cl  | 63 +++
 generic/lib/shared/vstore_if.ll   | 59 
 generic/lib/shared/vstore_impl.ll | 50 +++
 4 files changed, 168 insertions(+), 6 deletions(-)
 create mode 100644 generic/lib/shared/vstore_if.ll
 create mode 100644 generic/lib/shared/vstore_impl.ll

diff --git a/generic/lib/SOURCES b/generic/lib/SOURCES
index 9f6acf3..8cda14a 100644
--- a/generic/lib/SOURCES
+++ b/generic/lib/SOURCES
@@ -27,5 +27,7 @@ shared/vload.cl
 shared/vload_if.ll
 shared/vload_impl.ll
 shared/vstore.cl
+shared/vstore_if.ll
+shared/vstore_impl.ll
 workitem/get_global_id.cl
 workitem/get_global_size.cl
diff --git a/generic/lib/shared/vstore.cl b/generic/lib/shared/vstore.cl
index e88ccc5..5b84f47 100644
--- a/generic/lib/shared/vstore.cl
+++ b/generic/lib/shared/vstore.cl
@@ -15,10 +15,8 @@
   } \
 \
   _CLC_OVERLOAD _CLC_DEF void vstore4(PRIM_TYPE##4 vec, size_t offset, 
ADDR_SPACE PRIM_TYPE *mem) { \
-mem[offset] = vec.s0; \
-mem[offset+1] = vec.s1; \
-mem[offset+2] = vec.s2; \
-mem[offset+3] = vec.s3; \
+vstore2(vec.lo, offset, mem); \
+vstore2(vec.hi, offset+2, mem); \
   } \
 \
   _CLC_OVERLOAD _CLC_DEF void vstore8(PRIM_TYPE##8 vec, size_t offset, 
ADDR_SPACE PRIM_TYPE *mem) { \
@@ -36,13 +34,12 @@
 VSTORE_VECTORIZE(SCALAR_GENTYPE, __local) \
 VSTORE_VECTORIZE(SCALAR_GENTYPE, __global) \
 
+//int/uint are special... see below
 #define VSTORE_TYPES() \
 VSTORE_ADDR_SPACES(char) \
 VSTORE_ADDR_SPACES(uchar) \
 VSTORE_ADDR_SPACES(short) \
 VSTORE_ADDR_SPACES(ushort) \
-VSTORE_ADDR_SPACES(int) \
-VSTORE_ADDR_SPACES(uint) \
 VSTORE_ADDR_SPACES(long) \
 VSTORE_ADDR_SPACES(ulong) \
 VSTORE_ADDR_SPACES(float) \
@@ -54,3 +51,57 @@ VSTORE_TYPES()
 VSTORE_ADDR_SPACES(double)
 #endif
 
+VSTORE_VECTORIZE(int, __private)
+VSTORE_VECTORIZE(int, __local)
+VSTORE_VECTORIZE(uint, __private)
+VSTORE_VECTORIZE(uint, __local)
+
+_CLC_OVERLOAD _CLC_DEF void vstore2(int2 vec, size_t offset, global int *mem) {
+mem[offset] = vec.s0;
+mem[offset+1] = vec.s1;
+}
+_CLC_OVERLOAD _CLC_DEF void vstore3(int3 vec, size_t offset, global int *mem) {
+mem[offset] = vec.s0;
+mem[offset+1] = vec.s1;
+mem[offset+2] = vec.s2;
+}
+_CLC_OVERLOAD _CLC_DEF void vstore2(uint2 vec, size_t offset, global uint 
*mem) {
+mem[offset] = vec.s0;
+mem[offset+1] = vec.s1;
+}
+_CLC_OVERLOAD _CLC_DEF void vstore3(uint3 vec, size_t offset, global uint 
*mem) {
+mem[offset] = vec.s0;
+mem[offset+1] = vec.s1;
+mem[offset+2] = vec.s2;
+}
+
+/*Note: R600 probably doesn't support store 2 x ? and 3 x ?... so
+ * they aren't actually overridden here... lowest-common-denominator
+ */
+_CLC_DECL void __clc_vstore4_int__global(int4 vec, size_t offset, __global int 
*);
+_CLC_DECL void __clc_vstore8_int__global(int8 vec, size_t offset, __global int 
*);
+_CLC_DECL void __clc_vstore16_int__global(int16 vec, size_t offset, __global 
int *);
+
+_CLC_OVERLOAD _CLC_DEF void vstore4(int4 vec, size_t offset, global int *x) {
+__clc_vstore4_int__global(vec, offset, x);
+}
+_CLC_OVERLOAD _CLC_DEF void vstore8(int8 vec, size_t offset, global int *x) {
+__clc_vstore8_int__global(vec, offset, x);
+}
+_CLC_OVERLOAD _CLC_DEF void vstore16(int16 vec, size_t offset, global int *x) {
+__clc_vstore16_int__global(vec, offset, x);
+}
+
+_CLC_DECL void __clc_vstore4_uint__global(uint4 vec, size_t offset, __global 
uint *);
+_CLC_DECL void __clc_vstore8_uint__global(uint8 vec, size_t offset, __global 
uint *);
+_CLC_DECL void __clc_vstore16_uint__global(uint16 vec, size_t offset, __global 
uint *);
+
+_CLC_OVERLOAD _CLC_DEF void vstore4(uint4 vec, size_t offset, global uint *x) {
+__clc_vstore4_uint__global(vec, offset, x);
+}
+_CLC_OVERLOAD _CLC_DEF void vstore8(uint8 vec, size_t offset, global uint *x) {
+__clc_vstore8_uint__global(vec, offset, x);
+}
+_CLC_OVERLOAD _CLC_DEF void vstore16(uint16 vec, size_t offset, global uint 
*x) {
+__clc_vstore16_uint__global(vec, offset, x);
+}
diff --git a/generic/lib/shared/vstore_if.ll b/generic/lib/shared/vstore_if.ll
new file mode 100644
index 000..30eb552
--- /dev/null
+++ b/generic/lib/shared/vstore_if.ll
@@ -0,0 +1,59 @@
+;Start int global vstore
+
+declare void @__clc_vstore2_impl_i32__global(2 x i32 %vec, i32 %x, i32 %y)
+declare void @__clc_vstore3_impl_i32__global(3 x i32 %vec, i32 %x, i32 %y)
+declare void @__clc_vstore4_impl_i32__global(4 x i32 %vec, i32 %x, i32 %y)
+declare void @__clc_vstore8_impl_i32__global(8 x i32 %vec, i32 %x, i32 %y)
+declare void 

Re: [Mesa-dev] [PATCH] clover: Don't segfault when compiling a program with no kernel

2013-06-06 Thread Aaron Watry
Looks good to me.  Is there a piglit test for this?

--Aaron

On Wed, Jun 5, 2013 at 7:12 PM, Tom Stellard t...@stellard.net wrote:
 From: Tom Stellard thomas.stell...@amd.com

 ---
  src/gallium/state_trackers/clover/llvm/invocation.cpp | 7 +++
  1 file changed, 7 insertions(+)

 diff --git a/src/gallium/state_trackers/clover/llvm/invocation.cpp 
 b/src/gallium/state_trackers/clover/llvm/invocation.cpp
 index 2d115ed..8ec089d 100644
 --- a/src/gallium/state_trackers/clover/llvm/invocation.cpp
 +++ b/src/gallium/state_trackers/clover/llvm/invocation.cpp
 @@ -209,6 +209,13 @@ namespace {
 find_kernels(llvm::Module *mod, std::vectorllvm::Function * kernels) {
const llvm::NamedMDNode *kernel_node =
   mod-getNamedMetadata(opencl.kernels);
 +  // This means there are no kernels in the program.  The spec does not
 +  // require that we return an error here, but there will be an error if
 +  // the user tries to pass this program to a clCreateKernel() call.
 +  if (!kernel_node) {
 + return;
 +  }
 +
for (unsigned i = 0; i  kernel_node-getNumOperands(); ++i) {
   kernels.push_back(llvm::dyn_castllvm::Function(
  
 kernel_node-getOperand(i)-getOperand(0)));
 --
 1.7.11.4

 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH libclc] Implement barrier() builtin

2013-06-13 Thread Aaron Watry
FYI: I've applied your related piglit test and R600 back-end patches
and tested this on a CEDAR (HD5400).

Note: I had some trouble applying patches 4 and 5 of the R600 patches
but after chopping out the unit tests and creating those files by hand
(and using --ignore-whitespace), everything is there and functioning.

For the libclc change:
Reviewed-by: Aaron Watry awa...@gmail.com

On Wed, Jun 12, 2013 at 7:31 PM, Tom Stellard t...@stellard.net wrote:
 From: Tom Stellard thomas.stell...@amd.com

 ---
  r600/lib/SOURCES |  2 ++
  r600/lib/synchronization/barrier.cl  | 15 +++
  r600/lib/synchronization/barrier_impl.ll | 12 
  3 files changed, 29 insertions(+)
  create mode 100644 r600/lib/synchronization/barrier.cl
  create mode 100644 r600/lib/synchronization/barrier_impl.ll

 diff --git a/r600/lib/SOURCES b/r600/lib/SOURCES
 index af8c8c8..16ef3ac 100644
 --- a/r600/lib/SOURCES
 +++ b/r600/lib/SOURCES
 @@ -2,3 +2,5 @@ workitem/get_group_id.ll
  workitem/get_local_size.ll
  workitem/get_local_id.ll
  workitem/get_global_size.ll
 +synchronization/barrier.cl
 +synchronization/barrier_impl.ll
 diff --git a/r600/lib/synchronization/barrier.cl 
 b/r600/lib/synchronization/barrier.cl
 new file mode 100644
 index 000..ac0b4b3
 --- /dev/null
 +++ b/r600/lib/synchronization/barrier.cl
 @@ -0,0 +1,15 @@
 +
 +#include clc/clc.h
 +
 +void barrier_local(void);
 +void barrier_global(void);
 +
 +void barrier(cl_mem_fence_flags flags) {
 +  if (flags  CLK_LOCAL_MEM_FENCE) {
 +barrier_local();
 +  }
 +
 +  if (flags  CLK_GLOBAL_MEM_FENCE) {
 +barrier_global();
 +  }
 +}
 diff --git a/r600/lib/synchronization/barrier_impl.ll 
 b/r600/lib/synchronization/barrier_impl.ll
 new file mode 100644
 index 000..99ac018
 --- /dev/null
 +++ b/r600/lib/synchronization/barrier_impl.ll
 @@ -0,0 +1,12 @@
 +declare void @llvm.AMDGPU.barrier.local() nounwind
 +declare void @llvm.AMDGPU.barrier.global() nounwind
 +
 +define void @barrier_local() nounwind alwaysinline {
 +  call void @llvm.AMDGPU.barrier.local()
 +  ret void
 +}
 +
 +define void @barrier_global() nounwind alwaysinline {
 +  call void @llvm.AMDGPU.barrier.global()
 +  ret void
 +}
 --
 1.7.11.4

 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/2] r600g/compute: Move compute_shader_create() function into evergreen_compute.c

2013-06-13 Thread Aaron Watry
On Wed, Jun 12, 2013 at 7:34 PM, Tom Stellard t...@stellard.net wrote:
 From: Tom Stellard thomas.stell...@amd.com

 ---
  src/gallium/drivers/r600/evergreen_compute.c | 23 +++-
  src/gallium/drivers/r600/r600_shader.c   | 32 
 
  2 files changed, 22 insertions(+), 33 deletions(-)

 diff --git a/src/gallium/drivers/r600/evergreen_compute.c 
 b/src/gallium/drivers/r600/evergreen_compute.c
 index c993c09..b16c9d9 100644
 --- a/src/gallium/drivers/r600/evergreen_compute.c
 +++ b/src/gallium/drivers/r600/evergreen_compute.c
 @@ -46,6 +46,7 @@
  #include evergreen_compute.h
  #include evergreen_compute_internal.h
  #include compute_memory_pool.h
 +#include sb/sb_public.h
  #ifdef HAVE_OPENCL
  #include radeon_llvm_util.h
  #endif
 @@ -522,7 +523,27 @@ static void evergreen_launch_grid(
 if (!shader-kernels[pc].code_bo) {
 void *p;
 struct r600_kernel *kernel = shader-kernels[pc];
 -   r600_compute_shader_create(ctx_, kernel-llvm_module, 
 kernel-bc);
 +   struct r600_bytecode *bc = kernel-bc;
 +   LLVMModuleRef mod = kernel-llvm_module;
 +   boolean use_kill = false;
 +   bool dump = (ctx-screen-debug_flags  DBG_CS) != 0;
 +   unsigned use_sb = ctx-screen-debug_flags  DBG_SB_CS;
 +   unsigned sb_disasm = use_sb ||
 +   (ctx-screen-debug_flags  DBG_SB_DISASM);
 +
 +   r600_bytecode_init(bc, ctx-chip_class, ctx-family,
 +  ctx-screen-has_compressed_msaa_texturing);
 +   bc-type = TGSI_PROCESSOR_COMPUTE;
 +   bc-isa = ctx-isa;
 +   r600_llvm_compile(mod, ctx-family, bc, use_kill, dump);
 +
 +   if (dump  !sb_disasm) {
 +   r600_bytecode_disasm(bc);
 +   } else if ((dump  sb_disasm) || use_sb) {
 +   if (r600_sb_bytecode_process(ctx, bc, NULL, dump, 
 use_sb))
 +   R600_ERR(r600_sb_bytecode_process 
 failed!\n);
 +   }
 +
 kernel-code_bo = r600_compute_buffer_alloc_vram(ctx-screen,
 kernel-bc.ndw * 4);
 p = r600_buffer_mmap_sync_with_rings(ctx, kernel-code_bo, 
 PIPE_TRANSFER_WRITE);
 diff --git a/src/gallium/drivers/r600/r600_shader.c 
 b/src/gallium/drivers/r600/r600_shader.c
 index 81ed3ce..97c625c 100644
 --- a/src/gallium/drivers/r600/r600_shader.c
 +++ b/src/gallium/drivers/r600/r600_shader.c
 @@ -291,38 +291,6 @@ static int tgsi_bgnloop(struct r600_shader_ctx *ctx);
  static int tgsi_endloop(struct r600_shader_ctx *ctx);
  static int tgsi_loop_brk_cont(struct r600_shader_ctx *ctx);

 -#ifdef HAVE_OPENCL
 -int r600_compute_shader_create(struct pipe_context * ctx,
 -   LLVMModuleRef mod,  struct r600_bytecode * bytecode)
 -{

There's an associated declaration of this function in r600_pipe.h that
is now unused... should this be removed? Otherwise, this looks good to
me.

FYI: Tested on CEDAR (HD5400).

--Aaron


 -   struct r600_context *r600_ctx = (struct r600_context *)ctx;
 -   struct r600_shader_ctx shader_ctx;
 -   boolean use_kill = false;
 -   bool dump = (r600_ctx-screen-debug_flags  DBG_CS) != 0;
 -   unsigned use_sb = r600_ctx-screen-debug_flags  DBG_SB_CS;
 -   unsigned sb_disasm = use_sb ||
 -   (r600_ctx-screen-debug_flags  DBG_SB_DISASM);
 -
 -   shader_ctx.bc = bytecode;
 -   r600_bytecode_init(shader_ctx.bc, r600_ctx-chip_class, 
 r600_ctx-family,
 -  r600_ctx-screen-has_compressed_msaa_texturing);
 -   shader_ctx.bc-type = TGSI_PROCESSOR_COMPUTE;
 -   shader_ctx.bc-isa = r600_ctx-isa;
 -   r600_llvm_compile(mod, r600_ctx-family,
 -   shader_ctx.bc, use_kill, dump);
 -
 -   if (dump  !sb_disasm) {
 -   r600_bytecode_disasm(shader_ctx.bc);
 -   } else if ((dump  sb_disasm) || use_sb) {
 -   if (r600_sb_bytecode_process(r600_ctx, shader_ctx.bc, NULL, 
 dump, use_sb))
 -   R600_ERR(r600_sb_bytecode_process failed!\n);
 -   }
 -
 -   return 1;
 -}
 -
 -#endif /* HAVE_OPENCL */
 -
  static int tgsi_is_supported(struct r600_shader_ctx *ctx)
  {
 struct tgsi_full_instruction *i = 
 ctx-parse.FullToken.FullInstruction;
 --
 1.7.11.4

 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] r600g/compute: Accept LDS size from the LLVM backend

2013-06-13 Thread Aaron Watry
For both patches in this series, the original files use tabs for
indentation, not the spaces that the patches introduce. Might want to
fix that for consistency.

I'm not familiar enough with the register poking to give a qualified
review, but everything else looks reasonable to me.

Tested-by: Aaron Watry awa...@gmail.com

On Wed, Jun 12, 2013 at 7:34 PM, Tom Stellard t...@stellard.net wrote:
 From: Tom Stellard thomas.stell...@amd.com

 And allocate the correct amount before dispatching the kernel.
 ---
  src/gallium/drivers/r600/evergreen_compute.c   | 53 
 +++---
  .../drivers/r600/evergreen_compute_internal.h  |  1 +
  src/gallium/drivers/r600/evergreen_state.c |  6 +--
  src/gallium/drivers/r600/r600_asm.h|  1 +
  src/gallium/drivers/r600/r600_llvm.c   |  3 ++
  5 files changed, 44 insertions(+), 20 deletions(-)

 diff --git a/src/gallium/drivers/r600/evergreen_compute.c 
 b/src/gallium/drivers/r600/evergreen_compute.c
 index b16c9d9..226933b 100644
 --- a/src/gallium/drivers/r600/evergreen_compute.c
 +++ b/src/gallium/drivers/r600/evergreen_compute.c
 @@ -211,7 +211,8 @@ void *evergreen_create_compute_state(
  #endif

 shader-ctx = (struct r600_context*)ctx;
 -   shader-local_size = cso-req_local_mem; ///TODO: assert it
 +   /* XXX: We ignore cso-req_local_mem, because we compute this value
 +* ourselves on a per-kernel basis. */
 shader-private_size = cso-req_private_mem;
 shader-input_size = cso-req_input_mem;

 @@ -327,13 +328,13 @@ static void evergreen_emit_direct_dispatch(
  {
 int i;
 struct radeon_winsys_cs *cs = rctx-rings.gfx.cs;
 +   struct r600_pipe_compute *shader = rctx-cs_shader_state.shader;
 unsigned num_waves;
 unsigned num_pipes = rctx-screen-info.r600_max_pipes;
 unsigned wave_divisor = (16 * num_pipes);
 int group_size = 1;
 int grid_size = 1;
 -   /* XXX: Enable lds and get size from cs_shader_state */
 -   unsigned lds_size = 0;
 +   unsigned lds_size = shader-active_kernel-bc.nlds_dw;

 /* Calculate group_size/grid_size */
 for (i = 0; i  3; i++) {
 @@ -348,16 +349,10 @@ static void evergreen_emit_direct_dispatch(
 num_waves = (block_layout[0] * block_layout[1] * block_layout[2] +
 wave_divisor - 1) / wave_divisor;

 -   COMPUTE_DBG(rctx-screen, Using %u pipes, there are %u wavefronts 
 per thread block\n,
 -   num_pipes, num_waves);
 -
 -   /* XXX: Partition the LDS between PS/CS.  By default half (4096 dwords
 -* on Evergreen) oes to Pixel Shaders and half goes to Compute 
 Shaders.
 -* We may need to allocat the entire LDS space for Compute Shaders.
 -*
 -* EG: R_008E2C_SQ_LDS_RESOURCE_MGMT := 
 S_008E2C_NUM_LS_LDS(lds_dwords)
 -* CM: CM_R_0286FC_SPI_LDS_MGMT :=  S_0286FC_NUM_LS_LDS(lds_dwords)
 -*/
 +   COMPUTE_DBG(rctx-screen, Using %u pipes, 
 +   %u wavefronts per thread block, 
 +   allocating %u dwords lds.\n,
 +   num_pipes, num_waves, lds_size);

 r600_write_config_reg(cs, R_008970_VGT_NUM_INDICES, group_size);

 @@ -374,6 +369,14 @@ static void evergreen_emit_direct_dispatch(
 r600_write_value(cs, block_layout[1]); /* 
 R_0286F0_SPI_COMPUTE_NUM_THREAD_Y */
 r600_write_value(cs, block_layout[2]); /* 
 R_0286F4_SPI_COMPUTE_NUM_THREAD_Z */

 +   if (rctx-chip_class  CAYMAN) {
 +   assert(lds_size = 8192);
 +   } else {
 +   /* Cayman appears to have a slightly smaller limit, see the
 +* value of CM_R_0286FC_SPI_LDS_MGMT.NUM_LS_LDS */
 +   assert(lds_size = 8160);
 +   }
 +
 r600_write_compute_context_reg(cs, CM_R_0288E8_SQ_LDS_ALLOC,
 lds_size | (num_waves  14));

 @@ -517,12 +520,14 @@ static void evergreen_launch_grid(
 struct r600_context *ctx = (struct r600_context *)ctx_;

  #ifdef HAVE_OPENCL
 -   COMPUTE_DBG(ctx-screen, *** evergreen_launch_grid: pc = %u\n, pc);

 struct r600_pipe_compute *shader = ctx-cs_shader_state.shader;
 -   if (!shader-kernels[pc].code_bo) {
 +   struct r600_kernel *kernel = shader-kernels[pc];
 +
 +   COMPUTE_DBG(ctx-screen, *** evergreen_launch_grid: pc = %u\n, pc);
 +
 +   if (!kernel-code_bo) {
 void *p;
 -   struct r600_kernel *kernel = shader-kernels[pc];
 struct r600_bytecode *bc = kernel-bc;
 LLVMModuleRef mod = kernel-llvm_module;
 boolean use_kill = false;
 @@ -551,7 +556,7 @@ static void evergreen_launch_grid(
 ctx-ws-buffer_unmap(kernel-code_bo-cs_buf);
 }
  #endif
 -
 +   shader-active_kernel = kernel;
 ctx-cs_shader_state.kernel_index

[Mesa-dev] [PATCH] R600: Add SI load support for v[24]i32 and store for v2i32

2013-06-14 Thread Aaron Watry
Also add a seperate vector lit test file, since r600 doesn't seem to handle
v2i32 load/store yet, but we can test both for SI.

Signed-off-by: Aaron Watry awa...@gmail.com
---
 lib/Target/R600/SIInstructions.td |  5 +
 test/CodeGen/R600/load.vec.ll | 19 +++
 2 files changed, 24 insertions(+)
 create mode 100644 test/CodeGen/R600/load.vec.ll

diff --git a/lib/Target/R600/SIInstructions.td 
b/lib/Target/R600/SIInstructions.td
index e8ed2dd..9c96c08 100644
--- a/lib/Target/R600/SIInstructions.td
+++ b/lib/Target/R600/SIInstructions.td
@@ -1638,6 +1638,10 @@ defm : MUBUFLoad_Pattern BUFFER_LOAD_DWORD_ADDR64, i32,
   global_load, constant_load;
 defm : MUBUFLoad_Pattern BUFFER_LOAD_UBYTE_ADDR64, i32,
   zextloadi8_global, zextloadi8_constant;
+defm : MUBUFLoad_Pattern BUFFER_LOAD_DWORDX2_ADDR64, v2i32,
+  global_load, constant_load;
+defm : MUBUFLoad_Pattern BUFFER_LOAD_DWORDX4_ADDR64, v4i32,
+  global_load, constant_load;
 
 multiclass MUBUFStore_Pattern MUBUF Instr, ValueType vt {
 
@@ -1654,6 +1658,7 @@ multiclass MUBUFStore_Pattern MUBUF Instr, ValueType vt 
{
 
 defm : MUBUFStore_Pattern BUFFER_STORE_DWORD, i32;
 defm : MUBUFStore_Pattern BUFFER_STORE_DWORDX2, i64;
+defm : MUBUFStore_Pattern BUFFER_STORE_DWORDX2, v2i32;
 defm : MUBUFStore_Pattern BUFFER_STORE_DWORDX4, v4i32;
 
 /** == **/
diff --git a/test/CodeGen/R600/load.vec.ll b/test/CodeGen/R600/load.vec.ll
new file mode 100644
index 000..08e034e
--- /dev/null
+++ b/test/CodeGen/R600/load.vec.ll
@@ -0,0 +1,19 @@
+; RUN: llc  %s -march=r600 -mcpu=SI | FileCheck --check-prefix=SI-CHECK  %s
+
+; load a v2i32 value from the global address space.
+; SI-CHECK: @load_v2i32
+; SI-CHECK: BUFFER_LOAD_DWORDX2 VGPR{{[0-9]+}}
+define void @load_v2i32(2 x i32 addrspace(1)* %out, 2 x i32 addrspace(1)* 
%in) {
+  %a = load 2 x i32 addrspace(1) * %in
+  store 2 x i32 %a, 2 x i32 addrspace(1)* %out
+  ret void
+}
+
+; load a v4i32 value from the global address space.
+; SI-CHECK: @load_v4i32
+; SI-CHECK: BUFFER_LOAD_DWORDX4 VGPR{{[0-9]+}}
+define void @load_v4i32(4 x i32 addrspace(1)* %out, 4 x i32 addrspace(1)* 
%in) {
+  %a = load 4 x i32 addrspace(1) * %in
+  store 4 x i32 %a, 4 x i32 addrspace(1)* %out
+  ret void
+}
-- 
1.8.1.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] R600: Various fixes for R600 and SI

2013-06-17 Thread Aaron Watry
First patch fixes load/store for v2i32 on R600. Without this, the
other two will cause make check failures.  I've verified the changes
using a Radeon 5400 (Cedar).  Note that the previous custom
lowering of v2i32 store was causing silent data corruption.

The other two patches expand add/sub on SI for both v2i32 and v4i32
types. There's lit tests for v2i32 that have been added.

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/3] R600: Expand v2i32 load/store instead of custom lowering

2013-06-17 Thread Aaron Watry
The custom lowering causes llc to crash with a segfault.

Ideally, the custom lowering can be fixed, but this allows
programs which load/store v2i32 to work without crashing.

Signed-off-by: Aaron Watryawa...@gmail.com
---
 lib/Target/R600/R600ISelLowering.cpp | 4 ++--
 test/CodeGen/R600/load.vec.ll| 6 ++
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/lib/Target/R600/R600ISelLowering.cpp 
b/lib/Target/R600/R600ISelLowering.cpp
index 9cedadb..812df83 100644
--- a/lib/Target/R600/R600ISelLowering.cpp
+++ b/lib/Target/R600/R600ISelLowering.cpp
@@ -86,7 +86,7 @@ R600TargetLowering::R600TargetLowering(TargetMachine TM) :
 
   // Legalize loads and stores to the private address space.
   setOperationAction(ISD::LOAD, MVT::i32, Custom);
-  setOperationAction(ISD::LOAD, MVT::v2i32, Custom);
+  setOperationAction(ISD::LOAD, MVT::v2i32, Expand);
   setOperationAction(ISD::LOAD, MVT::v4i32, Custom);
   setLoadExtAction(ISD::EXTLOAD, MVT::v4i8, Custom);
   setLoadExtAction(ISD::EXTLOAD, MVT::i8, Custom);
@@ -94,7 +94,7 @@ R600TargetLowering::R600TargetLowering(TargetMachine TM) :
   setLoadExtAction(ISD::ZEXTLOAD, MVT::v4i8, Custom);
   setOperationAction(ISD::STORE, MVT::i8, Custom);
   setOperationAction(ISD::STORE, MVT::i32, Custom);
-  setOperationAction(ISD::STORE, MVT::v2i32, Custom);
+  setOperationAction(ISD::STORE, MVT::v2i32, Expand);
   setOperationAction(ISD::STORE, MVT::v4i32, Custom);
 
   setOperationAction(ISD::LOAD, MVT::i32, Custom);
diff --git a/test/CodeGen/R600/load.vec.ll b/test/CodeGen/R600/load.vec.ll
index 08e034e..da1149a 100644
--- a/test/CodeGen/R600/load.vec.ll
+++ b/test/CodeGen/R600/load.vec.ll
@@ -1,6 +1,10 @@
+; RUN: llc  %s -march=r600 -mcpu=redwood | FileCheck --check-prefix=EG-CHECK  
%s
 ; RUN: llc  %s -march=r600 -mcpu=SI | FileCheck --check-prefix=SI-CHECK  %s
 
 ; load a v2i32 value from the global address space.
+; EG-CHECK: @load_v2i32
+; EG-CHECK: VTX_READ_32 T{{[0-9]+}}.X, T{{[0-9]+}}.X, 4
+; EG-CHECK: VTX_READ_32 T{{[0-9]+}}.X, T{{[0-9]+}}.X, 0
 ; SI-CHECK: @load_v2i32
 ; SI-CHECK: BUFFER_LOAD_DWORDX2 VGPR{{[0-9]+}}
 define void @load_v2i32(2 x i32 addrspace(1)* %out, 2 x i32 addrspace(1)* 
%in) {
@@ -10,6 +14,8 @@ define void @load_v2i32(2 x i32 addrspace(1)* %out, 2 x 
i32 addrspace(1)* %i
 }
 
 ; load a v4i32 value from the global address space.
+; EG-CHECK: @load_v4i32
+; EG-CHECK: VTX_READ_128 T{{[0-9]+}}.XYZW, T{{[0-9]+}}.X, 0
 ; SI-CHECK: @load_v4i32
 ; SI-CHECK: BUFFER_LOAD_DWORDX4 VGPR{{[0-9]+}}
 define void @load_v4i32(4 x i32 addrspace(1)* %out, 4 x i32 addrspace(1)* 
%in) {
-- 
1.8.1.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/3] R600/SI: Expand add for v2i32 and v4i32

2013-06-17 Thread Aaron Watry
Also add SI tests to existing file and a v2i32 test for both
R600 and SI.

Signed-off-by: Aaron Watry awa...@gmail.com
---
 lib/Target/R600/SIISelLowering.cpp |  2 ++
 test/CodeGen/R600/add.ll   | 37 +++--
 2 files changed, 33 insertions(+), 6 deletions(-)

diff --git a/lib/Target/R600/SIISelLowering.cpp 
b/lib/Target/R600/SIISelLowering.cpp
index d74f401..bf4918a 100644
--- a/lib/Target/R600/SIISelLowering.cpp
+++ b/lib/Target/R600/SIISelLowering.cpp
@@ -65,6 +65,8 @@ SITargetLowering::SITargetLowering(TargetMachine TM) :
 
   setOperationAction(ISD::ADD, MVT::i64, Legal);
   setOperationAction(ISD::ADD, MVT::i32, Legal);
+  setOperationAction(ISD::ADD, MVT::v4i32, Expand);
+  setOperationAction(ISD::ADD, MVT::v2i32, Expand);
 
   setOperationAction(ISD::SELECT_CC, MVT::f32, Custom);
   setOperationAction(ISD::SELECT_CC, MVT::i32, Custom);
diff --git a/test/CodeGen/R600/add.ll b/test/CodeGen/R600/add.ll
index 185998b..dd590e5 100644
--- a/test/CodeGen/R600/add.ll
+++ b/test/CodeGen/R600/add.ll
@@ -1,11 +1,36 @@
-;RUN: llc  %s -march=r600 -mcpu=redwood | FileCheck %s
+; RUN: llc  %s -march=r600 -mcpu=redwood | FileCheck --check-prefix=EG-CHECK 
%s
+; RUN: llc  %s -march=r600 -mcpu=verde | FileCheck --check-prefix=SI-CHECK %s
 
-;CHECK: ADD_INT T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
-;CHECK: ADD_INT * T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
-;CHECK: ADD_INT * T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
-;CHECK: ADD_INT * T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
+;EG-CHECK: @test2
+;EG-CHECK: ADD_INT T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
+;EG-CHECK: ADD_INT * T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], literal\.[xyzw]}}
 
-define void @test(4 x i32 addrspace(1)* %out, 4 x i32 addrspace(1)* %in) {
+;SI-CHECK: @test2
+;SI-CHECK: V_ADD_I32_e32 VGPR{{[0-9]+, VGPR[0-9]+, VGPR[0-9]+}}
+;SI-CHECK: V_ADD_I32_e32 VGPR{{[0-9]+, VGPR[0-9]+, VGPR[0-9]+}}
+
+define void @test2(2 x i32 addrspace(1)* %out, 2 x i32 addrspace(1)* %in) {
+  %b_ptr = getelementptr 2 x i32 addrspace(1)* %in, i32 1
+  %a = load 2 x i32 addrspace(1) * %in
+  %b = load 2 x i32 addrspace(1) * %b_ptr
+  %result = add 2 x i32 %a, %b
+  store 2 x i32 %result, 2 x i32 addrspace(1)* %out
+  ret void
+}
+
+;EG-CHECK: @test4
+;EG-CHECK: ADD_INT T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
+;EG-CHECK: ADD_INT * T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
+;EG-CHECK: ADD_INT * T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
+;EG-CHECK: ADD_INT * T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
+
+;SI-CHECK: @test4
+;SI-CHECK: V_ADD_I32_e32 VGPR{{[0-9]+, VGPR[0-9]+, VGPR[0-9]+}}
+;SI-CHECK: V_ADD_I32_e32 VGPR{{[0-9]+, VGPR[0-9]+, VGPR[0-9]+}}
+;SI-CHECK: V_ADD_I32_e32 VGPR{{[0-9]+, VGPR[0-9]+, VGPR[0-9]+}}
+;SI-CHECK: V_ADD_I32_e32 VGPR{{[0-9]+, VGPR[0-9]+, VGPR[0-9]+}}
+
+define void @test4(4 x i32 addrspace(1)* %out, 4 x i32 addrspace(1)* %in) {
   %b_ptr = getelementptr 4 x i32 addrspace(1)* %in, i32 1
   %a = load 4 x i32 addrspace(1) * %in
   %b = load 4 x i32 addrspace(1) * %b_ptr
-- 
1.8.1.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] R600/SI: Add support for v4i32 and v4f32 kernel args

2013-06-18 Thread Aaron Watry
Tested on Pitcairn by: Aaron Watry awa...@gmail.com

Follow-up question: Would it be as easy as it looks to add v2i32 right away?

On Tue, Jun 18, 2013 at 6:21 PM, Tom Stellard t...@stellard.net wrote:
 From: Tom Stellard thomas.stell...@amd.com

 ---
  lib/Target/R600/AMDGPUCallingConv.td|  9 +
  test/CodeGen/R600/128bit-kernel-args.ll | 16 ++--
  2 files changed, 15 insertions(+), 10 deletions(-)

 diff --git a/lib/Target/R600/AMDGPUCallingConv.td 
 b/lib/Target/R600/AMDGPUCallingConv.td
 index 84e4f3a..826932b 100644
 --- a/lib/Target/R600/AMDGPUCallingConv.td
 +++ b/lib/Target/R600/AMDGPUCallingConv.td
 @@ -38,10 +38,11 @@ def CC_SI : CallingConv[

  // Calling convention for SI compute kernels
  def CC_SI_Kernel : CallingConv[
 -  CCIfType[i64],  CCAssignToStack 8, 4,
 -  CCIfType[i32, f32], CCAssignToStack 4, 4,
 -  CCIfType[i16],  CCAssignToStack 2, 4,
 -  CCIfType[i8],   CCAssignToStack 1, 4
 +  CCIfType[v4i32, v4f32], CCAssignToStack 16, 4,
 +  CCIfType[i64],  CCAssignToStack  8, 4,
 +  CCIfType[i32, f32], CCAssignToStack  4, 4,
 +  CCIfType[i16],  CCAssignToStack  2, 4,
 +  CCIfType[i8],   CCAssignToStack  1, 4
  ];

  def CC_AMDGPU : CallingConv[
 diff --git a/test/CodeGen/R600/128bit-kernel-args.ll 
 b/test/CodeGen/R600/128bit-kernel-args.ll
 index 114f9e7..bd60385 100644
 --- a/test/CodeGen/R600/128bit-kernel-args.ll
 +++ b/test/CodeGen/R600/128bit-kernel-args.ll
 @@ -1,16 +1,20 @@
 -;RUN: llc  %s -march=r600 -mcpu=redwood | FileCheck %s
 -
 -; CHECK: @v4i32_kernel_arg
 -; CHECK: VTX_READ_128 T{{[0-9]+}}.XYZW, T{{[0-9]+}}.X, 40
 +; RUN: llc  %s -march=r600 -mcpu=redwood | FileCheck %s 
 --check-prefix=R600-CHECK
 +; RUN: llc  %s -march=r600 -mcpu=SI | FileCheck %s --check-prefix=SI-CHECK

 +; R600-CHECK: @v4i32_kernel_arg
 +; R600-CHECK: VTX_READ_128 T{{[0-9]+}}.XYZW, T{{[0-9]+}}.X, 40
 +; SI-CHECK: @v4i32_kernel_arg
 +; SI-CHECK: BUFFER_STORE_DWORDX4
  define void @v4i32_kernel_arg(4 x i32 addrspace(1)* %out, 4 x i32  %in) {
  entry:
store 4 x i32 %in, 4 x i32 addrspace(1)* %out
ret void
  }

 -; CHECK: @v4f32_kernel_arg
 -; CHECK: VTX_READ_128 T{{[0-9]+}}.XYZW, T{{[0-9]+}}.X, 40
 +; R600-CHECK: @v4f32_kernel_arg
 +; R600-CHECK: VTX_READ_128 T{{[0-9]+}}.XYZW, T{{[0-9]+}}.X, 40
 +; SI-CHECK: @v4f32_kernel_arg
 +; SI-CHECK: BUFFER_STORE_DWORDX4
  define void @v4f32_kernel_args(4 x float addrspace(1)* %out, 4 x float  
 %in) {
  entry:
store 4 x float %in, 4 x float addrspace(1)* %out
 --
 1.7.11.4

 ___
 llvm-commits mailing list
 llvm-comm...@cs.uiuc.edu
 http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] R600: Expand integer operations for SI and consolidate code with EG

2013-06-20 Thread Aaron Watry
This series is intended to bring SI closer to evergreen when it comes to
support for v2i32/v4i32 integer operations.

It adds support for expanding the following v2i32/v4i32 operations on SI:
AND, MUL, OR, SHL, SRL, ASHR, UDIV, UREM, XOR

Once that's done, the setOperationAction(op,type,Expand) calls that appear in
both R600ISelLowering.cpp and SIISelLowering.cpp are all moved to
AMDGPUISelLowering.cpp.  If we decide to implement these ops through native
instructions for either target in the future, we can override that in the
individual targets.

Signed-off-by: Aaron Watry awa...@gmail.com

R600/SI: Expand and of v2i32/v4i32 for SI
R600/SI: Expand mul of v2i32/v4i32 for SI
R600/SI: Expand or of v2i32/v4i32 for SI
R600/SI: Expand shl of v2i32/v4i32 for SI
R600/SI: Expand srl of v2i32/v4i32 for SI
R600/SI: Expand ashr of v2i32/v4i32 for SI
R600/SI: Expand udiv v[24]i32 for SI and v2i32 for EG
R600/SI: Expand urem of v2i32/v4i32 for SI
R600: Add v2i32 test for setcc on evergreen
R600/SI: Expand xor v2i32/v4i32
R600: Add v2i32 test for vselect
R600: Consolidate expansion of v2i32/v4i32 ops for SI/EG

 lib/Target/R600/AMDGPUISelLowering.cpp | 22 
 lib/Target/R600/R600ISelLowering.cpp   | 18 -
 lib/Target/R600/SIISelLowering.cpp |  5 
 test/CodeGen/R600/and.ll   | 37 +-
 test/CodeGen/R600/mul.ll   | 38 ++-
 test/CodeGen/R600/or.ll| 41 -
 test/CodeGen/R600/setcc.ll | 25 +++---
 test/CodeGen/R600/shl.ll   | 47 ++
 test/CodeGen/R600/sra.ll   | 41 -
 test/CodeGen/R600/srl.ll   | 42 +-
 test/CodeGen/R600/udiv.ll  | 25 +++---
 test/CodeGen/R600/urem.ll  | 27 ---
 test/CodeGen/R600/vselect.ll   | 26 ++-
 test/CodeGen/R600/xor.ll   | 40 -
 14 files changed, 345 insertions(+), 89 deletions(-)

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 01/12] R600/SI: Expand and of v2i32/v4i32 for SI

2013-06-20 Thread Aaron Watry
Also add lit test for both cases on SI, and v2i32 for evergreen.

Signed-off-by: Aaron Watry awa...@gmail.com
---
 lib/Target/R600/SIISelLowering.cpp |  3 +++
 test/CodeGen/R600/and.ll   | 37 +++--
 2 files changed, 34 insertions(+), 6 deletions(-)

diff --git a/lib/Target/R600/SIISelLowering.cpp 
b/lib/Target/R600/SIISelLowering.cpp
index 776eb86..bf2e7d3 100644
--- a/lib/Target/R600/SIISelLowering.cpp
+++ b/lib/Target/R600/SIISelLowering.cpp
@@ -68,6 +68,9 @@ SITargetLowering::SITargetLowering(TargetMachine TM) :
   setOperationAction(ISD::ADD, MVT::v4i32, Expand);
   setOperationAction(ISD::ADD, MVT::v2i32, Expand);
 
+  setOperationAction(ISD::AND, MVT::v2i32, Expand);
+  setOperationAction(ISD::AND, MVT::v4i32, Expand);
+
   setOperationAction(ISD::SUB, MVT::v2i32, Expand);
   setOperationAction(ISD::SUB, MVT::v4i32, Expand);
 
diff --git a/test/CodeGen/R600/and.ll b/test/CodeGen/R600/and.ll
index 166af2d..44c21bd 100644
--- a/test/CodeGen/R600/and.ll
+++ b/test/CodeGen/R600/and.ll
@@ -1,11 +1,36 @@
-;RUN: llc  %s -march=r600 -mcpu=redwood | FileCheck %s
+;RUN: llc  %s -march=r600 -mcpu=redwood | FileCheck --check-prefix=EG-CHECK %s
+;RUN: llc  %s -march=r600 -mcpu=verde | FileCheck --check-prefix=SI-CHECK %s
 
-;CHECK: AND_INT T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
-;CHECK: AND_INT * T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
-;CHECK: AND_INT * T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
-;CHECK: AND_INT * T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
+;EG-CHECK: @test2
+;EG-CHECK: AND_INT {{\*? *}}T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], 
T[0-9]+\.[XYZW]}}
+;EG-CHECK: AND_INT {{\*? *}}T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], 
T[0-9]+\.[XYZW]}}
 
-define void @test(4 x i32 addrspace(1)* %out, 4 x i32 addrspace(1)* %in) {
+;SI-CHECK: @test2
+;SI-CHECK: V_AND_B32_e32 VGPR{{[0-9]+, VGPR[0-9]+, VGPR[0-9]+}}
+;SI-CHECK: V_AND_B32_e32 VGPR{{[0-9]+, VGPR[0-9]+, VGPR[0-9]+}}
+
+define void @test2(2 x i32 addrspace(1)* %out, 2 x i32 addrspace(1)* %in) {
+  %b_ptr = getelementptr 2 x i32 addrspace(1)* %in, i32 1
+  %a = load 2 x i32 addrspace(1) * %in
+  %b = load 2 x i32 addrspace(1) * %b_ptr
+  %result = and 2 x i32 %a, %b
+  store 2 x i32 %result, 2 x i32 addrspace(1)* %out
+  ret void
+}
+
+;EG-CHECK: @test4
+;EG-CHECK: AND_INT T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
+;EG-CHECK: AND_INT * T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
+;EG-CHECK: AND_INT * T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
+;EG-CHECK: AND_INT * T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
+
+;SI-CHECK: @test4
+;SI-CHECK: V_AND_B32_e32 VGPR{{[0-9]+, VGPR[0-9]+, VGPR[0-9]+}}
+;SI-CHECK: V_AND_B32_e32 VGPR{{[0-9]+, VGPR[0-9]+, VGPR[0-9]+}}
+;SI-CHECK: V_AND_B32_e32 VGPR{{[0-9]+, VGPR[0-9]+, VGPR[0-9]+}}
+;SI-CHECK: V_AND_B32_e32 VGPR{{[0-9]+, VGPR[0-9]+, VGPR[0-9]+}}
+
+define void @test4(4 x i32 addrspace(1)* %out, 4 x i32 addrspace(1)* %in) {
   %b_ptr = getelementptr 4 x i32 addrspace(1)* %in, i32 1
   %a = load 4 x i32 addrspace(1) * %in
   %b = load 4 x i32 addrspace(1) * %b_ptr
-- 
1.8.1.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 02/12] R600/SI: Expand mul of v2i32/v4i32 for SI

2013-06-20 Thread Aaron Watry
Also add lit test for both cases on SI, and v2i32 for evergreen.

Signed-off-by: Aaron Watry awa...@gmail.com
---
 lib/Target/R600/SIISelLowering.cpp |  3 +++
 test/CodeGen/R600/mul.ll   | 38 --
 2 files changed, 35 insertions(+), 6 deletions(-)

diff --git a/lib/Target/R600/SIISelLowering.cpp 
b/lib/Target/R600/SIISelLowering.cpp
index bf2e7d3..cb80e5e 100644
--- a/lib/Target/R600/SIISelLowering.cpp
+++ b/lib/Target/R600/SIISelLowering.cpp
@@ -71,6 +71,9 @@ SITargetLowering::SITargetLowering(TargetMachine TM) :
   setOperationAction(ISD::AND, MVT::v2i32, Expand);
   setOperationAction(ISD::AND, MVT::v4i32, Expand);
 
+  setOperationAction(ISD::MUL, MVT::v2i32, Expand);
+  setOperationAction(ISD::MUL, MVT::v4i32, Expand);
+
   setOperationAction(ISD::SUB, MVT::v2i32, Expand);
   setOperationAction(ISD::SUB, MVT::v4i32, Expand);
 
diff --git a/test/CodeGen/R600/mul.ll b/test/CodeGen/R600/mul.ll
index 7278e90..18a17b6 100644
--- a/test/CodeGen/R600/mul.ll
+++ b/test/CodeGen/R600/mul.ll
@@ -1,12 +1,38 @@
-;RUN: llc  %s -march=r600 -mcpu=redwood | FileCheck %s
+; RUN: llc  %s -march=r600 -mcpu=redwood | FileCheck --check-prefix=EG-CHECK 
%s
+; RUN: llc  %s -march=r600 -mcpu=verde | FileCheck --check-prefix=SI-CHECK %s
 
 ; mul24 and mad24 are affected
-;CHECK: MULLO_INT * T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
-;CHECK: MULLO_INT * T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
-;CHECK: MULLO_INT * T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
-;CHECK: MULLO_INT * T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
 
-define void @test(4 x i32 addrspace(1)* %out, 4 x i32 addrspace(1)* %in) {
+;EG-CHECK: @test2
+;EG-CHECK: MULLO_INT {{\*? *}}T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], 
T[0-9]+\.[XYZW]}}
+;EG-CHECK: MULLO_INT {{\*? *}}T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], 
T[0-9]+\.[XYZW]}}
+
+;SI-CHECK: @test2
+;SI-CHECK: V_MUL_LO_I32 VGPR{{[0-9]+, VGPR[0-9]+, VGPR[0-9]+}}
+;SI-CHECK: V_MUL_LO_I32 VGPR{{[0-9]+, VGPR[0-9]+, VGPR[0-9]+}}
+
+define void @test2(2 x i32 addrspace(1)* %out, 2 x i32 addrspace(1)* %in) {
+  %b_ptr = getelementptr 2 x i32 addrspace(1)* %in, i32 1
+  %a = load 2 x i32 addrspace(1) * %in
+  %b = load 2 x i32 addrspace(1) * %b_ptr
+  %result = mul 2 x i32 %a, %b
+  store 2 x i32 %result, 2 x i32 addrspace(1)* %out
+  ret void
+}
+
+;EG-CHECK: @test4
+;EG-CHECK: MULLO_INT {{\*? *}}T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], 
T[0-9]+\.[XYZW]}}
+;EG-CHECK: MULLO_INT {{\*? *}}T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], 
T[0-9]+\.[XYZW]}}
+;EG-CHECK: MULLO_INT {{\*? *}}T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], 
T[0-9]+\.[XYZW]}}
+;EG-CHECK: MULLO_INT {{\*? *}}T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], 
T[0-9]+\.[XYZW]}}
+
+;SI-CHECK: @test4
+;SI-CHECK: V_MUL_LO_I32 VGPR{{[0-9]+, VGPR[0-9]+, VGPR[0-9]+}}
+;SI-CHECK: V_MUL_LO_I32 VGPR{{[0-9]+, VGPR[0-9]+, VGPR[0-9]+}}
+;SI-CHECK: V_MUL_LO_I32 VGPR{{[0-9]+, VGPR[0-9]+, VGPR[0-9]+}}
+;SI-CHECK: V_MUL_LO_I32 VGPR{{[0-9]+, VGPR[0-9]+, VGPR[0-9]+}}
+
+define void @test4(4 x i32 addrspace(1)* %out, 4 x i32 addrspace(1)* %in) {
   %b_ptr = getelementptr 4 x i32 addrspace(1)* %in, i32 1
   %a = load 4 x i32 addrspace(1) * %in
   %b = load 4 x i32 addrspace(1) * %b_ptr
-- 
1.8.1.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 03/12] R600/SI: Expand or of v2i32/v4i32 for SI

2013-06-20 Thread Aaron Watry
Also add lit test for both cases on SI, and v2i32 for evergreen.

Signed-off-by: Aaron Watry awa...@gmail.com
---
 lib/Target/R600/SIISelLowering.cpp |  3 +++
 test/CodeGen/R600/or.ll| 41 +++---
 2 files changed, 37 insertions(+), 7 deletions(-)

diff --git a/lib/Target/R600/SIISelLowering.cpp 
b/lib/Target/R600/SIISelLowering.cpp
index cb80e5e..30a7de5 100644
--- a/lib/Target/R600/SIISelLowering.cpp
+++ b/lib/Target/R600/SIISelLowering.cpp
@@ -74,6 +74,9 @@ SITargetLowering::SITargetLowering(TargetMachine TM) :
   setOperationAction(ISD::MUL, MVT::v2i32, Expand);
   setOperationAction(ISD::MUL, MVT::v4i32, Expand);
 
+  setOperationAction(ISD::OR, MVT::v2i32, Expand);
+  setOperationAction(ISD::OR, MVT::v4i32, Expand);
+
   setOperationAction(ISD::SUB, MVT::v2i32, Expand);
   setOperationAction(ISD::SUB, MVT::v4i32, Expand);
 
diff --git a/test/CodeGen/R600/or.ll b/test/CodeGen/R600/or.ll
index b0dbb02..4a4e892 100644
--- a/test/CodeGen/R600/or.ll
+++ b/test/CodeGen/R600/or.ll
@@ -1,12 +1,39 @@
-; RUN: llc  %s -march=r600 -mcpu=redwood | FileCheck %s
+;RUN: llc  %s -march=r600 -mcpu=redwood | FileCheck --check-prefix=EG-CHECK %s
+;RUN: llc  %s -march=r600 -mcpu=verde | FileCheck --check-prefix=SI-CHECK %s
 
-; CHECK: @or_v4i32
-; CHECK: OR_INT * T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
-; CHECK: OR_INT * T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
-; CHECK: OR_INT * T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
-; CHECK: OR_INT * T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
+; EG-CHECK: @or_v2i32
+; EG-CHECK: OR_INT {{\*? *}}T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], 
T[0-9]+\.[XYZW]}}
+; EG-CHECK: OR_INT {{\*? *}}T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], 
T[0-9]+\.[XYZW]}}
 
-define void @or_v4i32(4 x i32 addrspace(1)* %out, 4 x i32 %a, 4 x i32 
%b) {
+;SI-CHECK: @or_v2i32
+;SI-CHECK: V_OR_B32_e32 VGPR{{[0-9]+, VGPR[0-9]+, VGPR[0-9]+}}
+;SI-CHECK: V_OR_B32_e32 VGPR{{[0-9]+, VGPR[0-9]+, VGPR[0-9]+}}
+
+define void @or_v2i32(2 x i32 addrspace(1)* %out, 2 x i32 addrspace(1)* 
%in) {
+  %b_ptr = getelementptr 2 x i32 addrspace(1)* %in, i32 1
+  %a = load 2 x i32 addrspace(1) * %in
+  %b = load 2 x i32 addrspace(1) * %b_ptr
+  %result = or 2 x i32 %a, %b
+  store 2 x i32 %result, 2 x i32 addrspace(1)* %out
+  ret void
+}
+
+; EG-CHECK: @or_v4i32
+; EG-CHECK: OR_INT {{\*? *}}T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], 
T[0-9]+\.[XYZW]}}
+; EG-CHECK: OR_INT {{\*? *}}T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], 
T[0-9]+\.[XYZW]}}
+; EG-CHECK: OR_INT {{\*? *}}T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], 
T[0-9]+\.[XYZW]}}
+; EG-CHECK: OR_INT {{\*? *}}T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], 
T[0-9]+\.[XYZW]}}
+
+;SI-CHECK: @or_v4i32
+;SI-CHECK: V_OR_B32_e32 VGPR{{[0-9]+, VGPR[0-9]+, VGPR[0-9]+}}
+;SI-CHECK: V_OR_B32_e32 VGPR{{[0-9]+, VGPR[0-9]+, VGPR[0-9]+}}
+;SI-CHECK: V_OR_B32_e32 VGPR{{[0-9]+, VGPR[0-9]+, VGPR[0-9]+}}
+;SI-CHECK: V_OR_B32_e32 VGPR{{[0-9]+, VGPR[0-9]+, VGPR[0-9]+}}
+
+define void @or_v4i32(4 x i32 addrspace(1)* %out, 4 x i32 addrspace(1)* 
%in) {
+  %b_ptr = getelementptr 4 x i32 addrspace(1)* %in, i32 1
+  %a = load 4 x i32 addrspace(1) * %in
+  %b = load 4 x i32 addrspace(1) * %b_ptr
   %result = or 4 x i32 %a, %b
   store 4 x i32 %result, 4 x i32 addrspace(1)* %out
   ret void
-- 
1.8.1.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 04/12] R600/SI: Expand shl of v2i32/v4i32 for SI

2013-06-20 Thread Aaron Watry
Also add lit test for both cases on SI, and v2i32 for evergreen.

Signed-off-by: Aaron Watry awa...@gmail.com
---
 lib/Target/R600/SIISelLowering.cpp |  3 +++
 test/CodeGen/R600/shl.ll   | 47 ++
 2 files changed, 40 insertions(+), 10 deletions(-)

diff --git a/lib/Target/R600/SIISelLowering.cpp 
b/lib/Target/R600/SIISelLowering.cpp
index 30a7de5..515c7a4 100644
--- a/lib/Target/R600/SIISelLowering.cpp
+++ b/lib/Target/R600/SIISelLowering.cpp
@@ -77,6 +77,9 @@ SITargetLowering::SITargetLowering(TargetMachine TM) :
   setOperationAction(ISD::OR, MVT::v2i32, Expand);
   setOperationAction(ISD::OR, MVT::v4i32, Expand);
 
+  setOperationAction(ISD::SHL, MVT::v2i32, Expand);
+  setOperationAction(ISD::SHL, MVT::v4i32, Expand);
+
   setOperationAction(ISD::SUB, MVT::v2i32, Expand);
   setOperationAction(ISD::SUB, MVT::v4i32, Expand);
 
diff --git a/test/CodeGen/R600/shl.ll b/test/CodeGen/R600/shl.ll
index db970e9..d68730a 100644
--- a/test/CodeGen/R600/shl.ll
+++ b/test/CodeGen/R600/shl.ll
@@ -1,16 +1,43 @@
-; RUN: llc  %s -march=r600 -mcpu=redwood | FileCheck %s
+;RUN: llc  %s -march=r600 -mcpu=redwood | FileCheck --check-prefix=EG-CHECK %s
+;RUN: llc  %s -march=r600 -mcpu=verde | FileCheck --check-prefix=SI-CHECK %s
 
-; CHECK: @shl_v4i32
-; CHECK: LSHL * T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
-; CHECK: LSHL * T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
-; CHECK: LSHL * T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
-; CHECK: LSHL * T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
+; XXX: Add SI test for i64 shl once i64 stores and i64 function arguments are
+; supported.
+
+;EG-CHECK: @shl_v2i32
+;EG-CHECK: LSHL {{\*? *}}T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
+;EG-CHECK: LSHL {{\*? *}}T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
+
+;SI-CHECK: @shl_v2i32
+;SI-CHECK: V_LSHL_B32_e32 VGPR{{[0-9]+, VGPR[0-9]+, VGPR[0-9]+}}
+;SI-CHECK: V_LSHL_B32_e32 VGPR{{[0-9]+, VGPR[0-9]+, VGPR[0-9]+}}
+
+define void @shl_v2i32(2 x i32 addrspace(1)* %out, 2 x i32 addrspace(1)* 
%in) {
+  %b_ptr = getelementptr 2 x i32 addrspace(1)* %in, i32 1
+  %a = load 2 x i32 addrspace(1) * %in
+  %b = load 2 x i32 addrspace(1) * %b_ptr
+  %result = shl 2 x i32 %a, %b
+  store 2 x i32 %result, 2 x i32 addrspace(1)* %out
+  ret void
+}
+
+;EG-CHECK: @shl_v4i32
+;EG-CHECK: LSHL {{\*? *}}T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
+;EG-CHECK: LSHL {{\*? *}}T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
+;EG-CHECK: LSHL {{\*? *}}T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
+;EG-CHECK: LSHL {{\*? *}}T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
+
+;SI-CHECK: @shl_v4i32
+;SI-CHECK: V_LSHL_B32_e32 VGPR{{[0-9]+, VGPR[0-9]+, VGPR[0-9]+}}
+;SI-CHECK: V_LSHL_B32_e32 VGPR{{[0-9]+, VGPR[0-9]+, VGPR[0-9]+}}
+;SI-CHECK: V_LSHL_B32_e32 VGPR{{[0-9]+, VGPR[0-9]+, VGPR[0-9]+}}
+;SI-CHECK: V_LSHL_B32_e32 VGPR{{[0-9]+, VGPR[0-9]+, VGPR[0-9]+}}
 
-define void @shl_v4i32(4 x i32 addrspace(1)* %out, 4 x i32 %a, 4 x i32 
%b) {
+define void @shl_v4i32(4 x i32 addrspace(1)* %out, 4 x i32 addrspace(1)* 
%in) {
+  %b_ptr = getelementptr 4 x i32 addrspace(1)* %in, i32 1
+  %a = load 4 x i32 addrspace(1) * %in
+  %b = load 4 x i32 addrspace(1) * %b_ptr
   %result = shl 4 x i32 %a, %b
   store 4 x i32 %result, 4 x i32 addrspace(1)* %out
   ret void
 }
-
-; XXX: Add SI test for i64 shl once i64 stores and i64 function arguments are
-; supported.
-- 
1.8.1.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 05/12] R600/SI: Expand srl of v2i32/v4i32 for SI

2013-06-20 Thread Aaron Watry
Also add lit test for both cases on SI, and v2i32 for evergreen.

Signed-off-by: Aaron Watry awa...@gmail.com
---
 lib/Target/R600/SIISelLowering.cpp |  2 ++
 test/CodeGen/R600/srl.ll   | 42 +++---
 2 files changed, 37 insertions(+), 7 deletions(-)

diff --git a/lib/Target/R600/SIISelLowering.cpp 
b/lib/Target/R600/SIISelLowering.cpp
index 515c7a4..4219825 100644
--- a/lib/Target/R600/SIISelLowering.cpp
+++ b/lib/Target/R600/SIISelLowering.cpp
@@ -79,6 +79,8 @@ SITargetLowering::SITargetLowering(TargetMachine TM) :
 
   setOperationAction(ISD::SHL, MVT::v2i32, Expand);
   setOperationAction(ISD::SHL, MVT::v4i32, Expand);
+  setOperationAction(ISD::SRL, MVT::v4i32, Expand);
+  setOperationAction(ISD::SRL, MVT::v2i32, Expand);
 
   setOperationAction(ISD::SUB, MVT::v2i32, Expand);
   setOperationAction(ISD::SUB, MVT::v4i32, Expand);
diff --git a/test/CodeGen/R600/srl.ll b/test/CodeGen/R600/srl.ll
index 5f63600..d1dcd7f 100644
--- a/test/CodeGen/R600/srl.ll
+++ b/test/CodeGen/R600/srl.ll
@@ -1,12 +1,40 @@
-; RUN: llc  %s -march=r600 -mcpu=redwood | FileCheck %s
+;RUN: llc  %s -march=r600 -mcpu=redwood | FileCheck --check-prefix=EG-CHECK %s
+;RUN: llc  %s -march=r600 -mcpu=verde | FileCheck --check-prefix=SI-CHECK %s
 
-; CHECK: @lshr_v4i32
-; CHECK: LSHR * T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
-; CHECK: LSHR * T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
-; CHECK: LSHR * T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
-; CHECK: LSHR * T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
+;EG-CHECK: @lshr_v2i32
+;EG-CHECK: LSHR {{\*? *}}T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
+;EG-CHECK: LSHR {{\*? *}}T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
 
-define void @lshr_v4i32(4 x i32 addrspace(1)* %out, 4 x i32 %a, 4 x i32 
%b) {
+;SI-CHECK: @lshr_v2i32
+;SI-CHECK: V_LSHR_B32_e32 VGPR{{[0-9]+, VGPR[0-9]+, VGPR[0-9]+}}
+;SI-CHECK: V_LSHR_B32_e32 VGPR{{[0-9]+, VGPR[0-9]+, VGPR[0-9]+}}
+
+define void @lshr_v2i32(2 x i32 addrspace(1)* %out, 2 x i32 addrspace(1)* 
%in) {
+  %b_ptr = getelementptr 2 x i32 addrspace(1)* %in, i32 1
+  %a = load 2 x i32 addrspace(1) * %in
+  %b = load 2 x i32 addrspace(1) * %b_ptr
+  %result = lshr 2 x i32 %a, %b
+  store 2 x i32 %result, 2 x i32 addrspace(1)* %out
+  ret void
+}
+
+
+;EG-CHECK: @lshr_v4i32
+;EG-CHECK: LSHR {{\*? *}}T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
+;EG-CHECK: LSHR {{\*? *}}T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
+;EG-CHECK: LSHR {{\*? *}}T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
+;EG-CHECK: LSHR {{\*? *}}T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
+
+;SI-CHECK: @lshr_v4i32
+;SI-CHECK: V_LSHR_B32_e32 VGPR{{[0-9]+, VGPR[0-9]+, VGPR[0-9]+}}
+;SI-CHECK: V_LSHR_B32_e32 VGPR{{[0-9]+, VGPR[0-9]+, VGPR[0-9]+}}
+;SI-CHECK: V_LSHR_B32_e32 VGPR{{[0-9]+, VGPR[0-9]+, VGPR[0-9]+}}
+;SI-CHECK: V_LSHR_B32_e32 VGPR{{[0-9]+, VGPR[0-9]+, VGPR[0-9]+}}
+
+define void @lshr_v4i32(4 x i32 addrspace(1)* %out, 4 x i32 addrspace(1)* 
%in) {
+  %b_ptr = getelementptr 4 x i32 addrspace(1)* %in, i32 1
+  %a = load 4 x i32 addrspace(1) * %in
+  %b = load 4 x i32 addrspace(1) * %b_ptr
   %result = lshr 4 x i32 %a, %b
   store 4 x i32 %result, 4 x i32 addrspace(1)* %out
   ret void
-- 
1.8.1.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 06/12] R600/SI: Expand ashr of v2i32/v4i32 for SI

2013-06-20 Thread Aaron Watry
Also add lit test for both cases on SI, and v2i32 for evergreen.

Signed-off-by: Aaron Watry awa...@gmail.com
---
 lib/Target/R600/SIISelLowering.cpp |  2 ++
 test/CodeGen/R600/sra.ll   | 41 +++---
 2 files changed, 36 insertions(+), 7 deletions(-)

diff --git a/lib/Target/R600/SIISelLowering.cpp 
b/lib/Target/R600/SIISelLowering.cpp
index 4219825..5f44d3a 100644
--- a/lib/Target/R600/SIISelLowering.cpp
+++ b/lib/Target/R600/SIISelLowering.cpp
@@ -81,6 +81,8 @@ SITargetLowering::SITargetLowering(TargetMachine TM) :
   setOperationAction(ISD::SHL, MVT::v4i32, Expand);
   setOperationAction(ISD::SRL, MVT::v4i32, Expand);
   setOperationAction(ISD::SRL, MVT::v2i32, Expand);
+  setOperationAction(ISD::SRA, MVT::v4i32, Expand);
+  setOperationAction(ISD::SRA, MVT::v2i32, Expand);
 
   setOperationAction(ISD::SUB, MVT::v2i32, Expand);
   setOperationAction(ISD::SUB, MVT::v4i32, Expand);
diff --git a/test/CodeGen/R600/sra.ll b/test/CodeGen/R600/sra.ll
index 972542d..7c5cc75 100644
--- a/test/CodeGen/R600/sra.ll
+++ b/test/CodeGen/R600/sra.ll
@@ -1,12 +1,39 @@
-; RUN: llc  %s -march=r600 -mcpu=redwood | FileCheck %s
+;RUN: llc  %s -march=r600 -mcpu=redwood | FileCheck --check-prefix=EG-CHECK %s
+;RUN: llc  %s -march=r600 -mcpu=verde | FileCheck --check-prefix=SI-CHECK %s
 
-; CHECK: @ashr_v4i32
-; CHECK: ASHR * T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
-; CHECK: ASHR * T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
-; CHECK: ASHR * T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
-; CHECK: ASHR * T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
+;EG-CHECK: @ashr_v2i32
+;EG-CHECK: ASHR {{\*? *}}T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
+;EG-CHECK: ASHR {{\*? *}}T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
 
-define void @ashr_v4i32(4 x i32 addrspace(1)* %out, 4 x i32 %a, 4 x i32 
%b) {
+;SI-CHECK: @ashr_v2i32
+;SI-CHECK: V_ASHR_I32_e32 VGPR{{[0-9]+, VGPR[0-9]+, VGPR[0-9]+}}
+;SI-CHECK: V_ASHR_I32_e32 VGPR{{[0-9]+, VGPR[0-9]+, VGPR[0-9]+}}
+
+define void @ashr_v2i32(2 x i32 addrspace(1)* %out, 2 x i32 addrspace(1)* 
%in) {
+  %b_ptr = getelementptr 2 x i32 addrspace(1)* %in, i32 1
+  %a = load 2 x i32 addrspace(1) * %in
+  %b = load 2 x i32 addrspace(1) * %b_ptr
+  %result = ashr 2 x i32 %a, %b
+  store 2 x i32 %result, 2 x i32 addrspace(1)* %out
+  ret void
+}
+
+;EG-CHECK: @ashr_v4i32
+;EG-CHECK: ASHR {{\*? *}}T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
+;EG-CHECK: ASHR {{\*? *}}T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
+;EG-CHECK: ASHR {{\*? *}}T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
+;EG-CHECK: ASHR {{\*? *}}T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
+
+;SI-CHECK: @ashr_v4i32
+;SI-CHECK: V_ASHR_I32_e32 VGPR{{[0-9]+, VGPR[0-9]+, VGPR[0-9]+}}
+;SI-CHECK: V_ASHR_I32_e32 VGPR{{[0-9]+, VGPR[0-9]+, VGPR[0-9]+}}
+;SI-CHECK: V_ASHR_I32_e32 VGPR{{[0-9]+, VGPR[0-9]+, VGPR[0-9]+}}
+;SI-CHECK: V_ASHR_I32_e32 VGPR{{[0-9]+, VGPR[0-9]+, VGPR[0-9]+}}
+
+define void @ashr_v4i32(4 x i32 addrspace(1)* %out, 4 x i32 addrspace(1)* 
%in) {
+  %b_ptr = getelementptr 4 x i32 addrspace(1)* %in, i32 1
+  %a = load 4 x i32 addrspace(1) * %in
+  %b = load 4 x i32 addrspace(1) * %b_ptr
   %result = ashr 4 x i32 %a, %b
   store 4 x i32 %result, 4 x i32 addrspace(1)* %out
   ret void
-- 
1.8.1.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 07/12] R600/SI: Expand udiv v[24]i32 for SI and v2i32 for EG

2013-06-20 Thread Aaron Watry
Also add lit test for both cases on SI, and v2i32 for evergreen.

Note: I followed the guidance of the v4i32 EG check... UDIV produces really
complex code, so let's just check that the instruction was lowered
successfully.

Signed-off-by: Aaron Watry awa...@gmail.com
---
 lib/Target/R600/R600ISelLowering.cpp |  1 +
 lib/Target/R600/SIISelLowering.cpp   |  3 +++
 test/CodeGen/R600/udiv.ll| 25 ++---
 3 files changed, 26 insertions(+), 3 deletions(-)

diff --git a/lib/Target/R600/R600ISelLowering.cpp 
b/lib/Target/R600/R600ISelLowering.cpp
index 812df83..cf349a8 100644
--- a/lib/Target/R600/R600ISelLowering.cpp
+++ b/lib/Target/R600/R600ISelLowering.cpp
@@ -56,6 +56,7 @@ R600TargetLowering::R600TargetLowering(TargetMachine TM) :
   setOperationAction(ISD::SUB, MVT::v4i32, Expand);
   setOperationAction(ISD::SUB, MVT::v2i32, Expand);
   setOperationAction(ISD::UINT_TO_FP, MVT::v4i32, Expand);
+  setOperationAction(ISD::UDIV, MVT::v2i32, Expand);
   setOperationAction(ISD::UDIV, MVT::v4i32, Expand);
   setOperationAction(ISD::UREM, MVT::v4i32, Expand);
   setOperationAction(ISD::SETCC, MVT::v4i32, Expand);
diff --git a/lib/Target/R600/SIISelLowering.cpp 
b/lib/Target/R600/SIISelLowering.cpp
index 5f44d3a..1fb28fa 100644
--- a/lib/Target/R600/SIISelLowering.cpp
+++ b/lib/Target/R600/SIISelLowering.cpp
@@ -87,6 +87,9 @@ SITargetLowering::SITargetLowering(TargetMachine TM) :
   setOperationAction(ISD::SUB, MVT::v2i32, Expand);
   setOperationAction(ISD::SUB, MVT::v4i32, Expand);
 
+  setOperationAction(ISD::UDIV, MVT::v2i32, Expand);
+  setOperationAction(ISD::UDIV, MVT::v4i32, Expand);
+
   setOperationAction(ISD::SELECT_CC, MVT::f32, Custom);
   setOperationAction(ISD::SELECT_CC, MVT::i32, Custom);
 
diff --git a/test/CodeGen/R600/udiv.ll b/test/CodeGen/R600/udiv.ll
index b81e366..08fe2ef 100644
--- a/test/CodeGen/R600/udiv.ll
+++ b/test/CodeGen/R600/udiv.ll
@@ -1,11 +1,30 @@
-;RUN: llc  %s -march=r600 -mcpu=redwood | FileCheck %s
+;RUN: llc  %s -march=r600 -mcpu=redwood | FileCheck --check-prefix=EG-CHECK %s
+;RUN: llc  %s -march=r600 -mcpu=verde | FileCheck --check-prefix=SI-CHECK %s
 
 ;The code generated by udiv is long and complex and may frequently change.
 ;The goal of this test is to make sure the ISel doesn't fail when it gets
 ;a v4i32 udiv
-;CHECK: CF_END
 
-define void @test(4 x i32 addrspace(1)* %out, 4 x i32 addrspace(1)* %in) {
+;EG-CHECK: @test2
+;EG-CHECK: CF_END
+;SI-CHECK: @test2
+;SI-CHECK: S_ENDPGM
+
+define void @test2(2 x i32 addrspace(1)* %out, 2 x i32 addrspace(1)* %in) {
+  %b_ptr = getelementptr 2 x i32 addrspace(1)* %in, i32 1
+  %a = load 2 x i32 addrspace(1) * %in
+  %b = load 2 x i32 addrspace(1) * %b_ptr
+  %result = udiv 2 x i32 %a, %b
+  store 2 x i32 %result, 2 x i32 addrspace(1)* %out
+  ret void
+}
+
+;EG-CHECK: @test4
+;EG-CHECK: CF_END
+;SI-CHECK: @test4
+;SI-CHECK: S_ENDPGM
+
+define void @test4(4 x i32 addrspace(1)* %out, 4 x i32 addrspace(1)* %in) {
   %b_ptr = getelementptr 4 x i32 addrspace(1)* %in, i32 1
   %a = load 4 x i32 addrspace(1) * %in
   %b = load 4 x i32 addrspace(1) * %b_ptr
-- 
1.8.1.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 08/12] R600/SI: Expand urem of v2i32/v4i32 for SI

2013-06-20 Thread Aaron Watry
Also add lit test for both cases on SI, and v2i32 for evergreen.

Note: I followed the guidance of the v4i32 EG check... UREM produces really
complex code, so let's just check that the instruction was lowered
successfully.

Signed-off-by: Aaron Watry awa...@gmail.com
---
 lib/Target/R600/SIISelLowering.cpp |  3 +++
 test/CodeGen/R600/urem.ll  | 27 +++
 2 files changed, 26 insertions(+), 4 deletions(-)

diff --git a/lib/Target/R600/SIISelLowering.cpp 
b/lib/Target/R600/SIISelLowering.cpp
index 1fb28fa..a784667 100644
--- a/lib/Target/R600/SIISelLowering.cpp
+++ b/lib/Target/R600/SIISelLowering.cpp
@@ -90,6 +90,9 @@ SITargetLowering::SITargetLowering(TargetMachine TM) :
   setOperationAction(ISD::UDIV, MVT::v2i32, Expand);
   setOperationAction(ISD::UDIV, MVT::v4i32, Expand);
 
+  setOperationAction(ISD::UREM, MVT::v2i32, Expand);
+  setOperationAction(ISD::UREM, MVT::v4i32, Expand);
+
   setOperationAction(ISD::SELECT_CC, MVT::f32, Custom);
   setOperationAction(ISD::SELECT_CC, MVT::i32, Custom);
 
diff --git a/test/CodeGen/R600/urem.ll b/test/CodeGen/R600/urem.ll
index a2cc0bd..cf3474c 100644
--- a/test/CodeGen/R600/urem.ll
+++ b/test/CodeGen/R600/urem.ll
@@ -1,11 +1,30 @@
-;RUN: llc  %s -march=r600 -mcpu=redwood | FileCheck %s
+;RUN: llc  %s -march=r600 -mcpu=redwood | FileCheck --check-prefix=EG-CHECK %s
+;RUN: llc  %s -march=r600 -mcpu=verde | FileCheck --check-prefix=SI-CHECK %s
 
 ;The code generated by urem is long and complex and may frequently change.
 ;The goal of this test is to make sure the ISel doesn't fail when it gets
-;a v4i32 urem
-;CHECK: CF_END
+;a v2i32/v4i32 urem
 
-define void @test(4 x i32 addrspace(1)* %out, 4 x i32 addrspace(1)* %in) {
+;EG-CHECK: @test2
+;EG-CHECK: CF_END
+;SI-CHECK: @test2
+;SI-CHECK: S_ENDPGM
+
+define void @test2(2 x i32 addrspace(1)* %out, 2 x i32 addrspace(1)* %in) {
+  %b_ptr = getelementptr 2 x i32 addrspace(1)* %in, i32 1
+  %a = load 2 x i32 addrspace(1) * %in
+  %b = load 2 x i32 addrspace(1) * %b_ptr
+  %result = urem 2 x i32 %a, %b
+  store 2 x i32 %result, 2 x i32 addrspace(1)* %out
+  ret void
+}
+
+;EG-CHECK: @test4
+;EG-CHECK: CF_END
+;SI-CHECK: @test4
+;SI-CHECK: S_ENDPGM
+
+define void @test4(4 x i32 addrspace(1)* %out, 4 x i32 addrspace(1)* %in) {
   %b_ptr = getelementptr 4 x i32 addrspace(1)* %in, i32 1
   %a = load 4 x i32 addrspace(1) * %in
   %b = load 4 x i32 addrspace(1) * %b_ptr
-- 
1.8.1.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 09/12] R600: Add v2i32 test for setcc on evergreen

2013-06-20 Thread Aaron Watry
No test/expansion for SI has been added yet. Attempts to expand this
operation for SI resulted in a stacktrace in (IIRC) LegalizeIntegerTypes
which was complaining about vector comparisons being required to return
a vector type.

Signed-off-by: Aaron Watry awa...@gmail.com
---
 test/CodeGen/R600/setcc.ll | 25 ++---
 1 file changed, 22 insertions(+), 3 deletions(-)

diff --git a/test/CodeGen/R600/setcc.ll b/test/CodeGen/R600/setcc.ll
index 0752f2e..e3f77b1 100644
--- a/test/CodeGen/R600/setcc.ll
+++ b/test/CodeGen/R600/setcc.ll
@@ -1,7 +1,26 @@
-;RUN: llc  %s -march=r600 -mcpu=redwood | FileCheck %s
-;CHECK: SETE_INT T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
+;RUN: llc  %s -march=r600 -mcpu=redwood | FileCheck --check-prefix=EG-CHECK %s
 
-define void @test(4 x i32 addrspace(1)* %out, 4 x i32 addrspace(1)* %in) {
+;EG-CHECK: @test2
+;EG-CHECK: SETE_INT {{\*? *}}T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], 
T[0-9]+\.[XYZW]}}
+;EG-CHECK: SETE_INT {{\*? *}}T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], 
T[0-9]+\.[XYZW]}}
+
+define void @test2(2 x i32 addrspace(1)* %out, 2 x i32 addrspace(1)* %in) {
+  %b_ptr = getelementptr 2 x i32 addrspace(1)* %in, i32 1
+  %a = load 2 x i32 addrspace(1) * %in
+  %b = load 2 x i32 addrspace(1) * %b_ptr
+  %result = icmp eq 2 x i32 %a, %b
+  %sext = sext 2 x i1 %result to 2 x i32
+  store 2 x i32 %sext, 2 x i32 addrspace(1)* %out
+  ret void
+}
+
+;EG-CHECK: @test4
+;EG-CHECK: SETE_INT {{\*? *}}T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], 
T[0-9]+\.[XYZW]}}
+;EG-CHECK: SETE_INT {{\*? *}}T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], 
T[0-9]+\.[XYZW]}}
+;EG-CHECK: SETE_INT {{\*? *}}T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], 
T[0-9]+\.[XYZW]}}
+;EG-CHECK: SETE_INT {{\*? *}}T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], 
T[0-9]+\.[XYZW]}}
+
+define void @test4(4 x i32 addrspace(1)* %out, 4 x i32 addrspace(1)* %in) {
   %b_ptr = getelementptr 4 x i32 addrspace(1)* %in, i32 1
   %a = load 4 x i32 addrspace(1) * %in
   %b = load 4 x i32 addrspace(1) * %b_ptr
-- 
1.8.1.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 10/12] R600/SI: Expand xor v2i32/v4i32

2013-06-20 Thread Aaron Watry
Add test cases for both vector sizes on SI and also add v2i32 test for EG.

Signed-off-by: Aaron Watry awa...@gmail.com
---
 lib/Target/R600/SIISelLowering.cpp |  3 +++
 test/CodeGen/R600/xor.ll   | 40 +++---
 2 files changed, 36 insertions(+), 7 deletions(-)

diff --git a/lib/Target/R600/SIISelLowering.cpp 
b/lib/Target/R600/SIISelLowering.cpp
index a784667..e70c7de 100644
--- a/lib/Target/R600/SIISelLowering.cpp
+++ b/lib/Target/R600/SIISelLowering.cpp
@@ -93,6 +93,9 @@ SITargetLowering::SITargetLowering(TargetMachine TM) :
   setOperationAction(ISD::UREM, MVT::v2i32, Expand);
   setOperationAction(ISD::UREM, MVT::v4i32, Expand);
 
+  setOperationAction(ISD::XOR, MVT::v2i32, Expand);
+  setOperationAction(ISD::XOR, MVT::v4i32, Expand);
+
   setOperationAction(ISD::SELECT_CC, MVT::f32, Custom);
   setOperationAction(ISD::SELECT_CC, MVT::i32, Custom);
 
diff --git a/test/CodeGen/R600/xor.ll b/test/CodeGen/R600/xor.ll
index cf612e0..f52729d 100644
--- a/test/CodeGen/R600/xor.ll
+++ b/test/CodeGen/R600/xor.ll
@@ -1,12 +1,38 @@
-; RUN: llc  %s -march=r600 -mcpu=redwood | FileCheck %s
+;RUN: llc  %s -march=r600 -mcpu=redwood | FileCheck --check-prefix=EG-CHECK %s
+;RUN: llc  %s -march=r600 -mcpu=verde | FileCheck --check-prefix=SI-CHECK %s
 
-; CHECK: @xor_v4i32
-; CHECK: XOR_INT * T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
-; CHECK: XOR_INT * T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
-; CHECK: XOR_INT * T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
-; CHECK: XOR_INT * T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
+;EG-CHECK: @xor_v2i32
+;EG-CHECK: XOR_INT {{\*? *}}T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], 
T[0-9]+\.[XYZW]}}
+;EG-CHECK: XOR_INT {{\*? *}}T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], 
T[0-9]+\.[XYZW]}}
 
-define void @xor_v4i32(4 x i32 addrspace(1)* %out, 4 x i32 %a, 4 x i32 
%b) {
+;SI-CHECK: @xor_v2i32
+;SI-CHECK: V_XOR_B32_e32 VGPR{{[0-9]+, VGPR[0-9]+, VGPR[0-9]+}}
+;SI-CHECK: V_XOR_B32_e32 VGPR{{[0-9]+, VGPR[0-9]+, VGPR[0-9]+}}
+
+
+define void @xor_v2i32(2 x i32 addrspace(1)* %out, 2 x i32 addrspace(1)* 
%in0, 2 x i32 addrspace(1)* %in1) {
+  %a = load 2 x i32 addrspace(1) * %in0
+  %b = load 2 x i32 addrspace(1) * %in1
+  %result = xor 2 x i32 %a, %b
+  store 2 x i32 %result, 2 x i32 addrspace(1)* %out
+  ret void
+}
+
+;EG-CHECK: @xor_v4i32
+;EG-CHECK: XOR_INT {{\*? *}}T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], 
T[0-9]+\.[XYZW]}}
+;EG-CHECK: XOR_INT {{\*? *}}T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], 
T[0-9]+\.[XYZW]}}
+;EG-CHECK: XOR_INT {{\*? *}}T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], 
T[0-9]+\.[XYZW]}}
+;EG-CHECK: XOR_INT {{\*? *}}T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], 
T[0-9]+\.[XYZW]}}
+
+;SI-CHECK: @xor_v4i32
+;SI-CHECK: V_XOR_B32_e32 VGPR{{[0-9]+, VGPR[0-9]+, VGPR[0-9]+}}
+;SI-CHECK: V_XOR_B32_e32 VGPR{{[0-9]+, VGPR[0-9]+, VGPR[0-9]+}}
+;SI-CHECK: V_XOR_B32_e32 VGPR{{[0-9]+, VGPR[0-9]+, VGPR[0-9]+}}
+;SI-CHECK: V_XOR_B32_e32 VGPR{{[0-9]+, VGPR[0-9]+, VGPR[0-9]+}}
+
+define void @xor_v4i32(4 x i32 addrspace(1)* %out, 4 x i32 addrspace(1)* 
%in0, 4 x i32 addrspace(1)* %in1) {
+  %a = load 4 x i32 addrspace(1) * %in0
+  %b = load 4 x i32 addrspace(1) * %in1
   %result = xor 4 x i32 %a, %b
   store 4 x i32 %result, 4 x i32 addrspace(1)* %out
   ret void
-- 
1.8.1.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 11/12] R600: Add v2i32 test for vselect

2013-06-20 Thread Aaron Watry
Note: Only adding test for evergreen, not SI yet.

When I attempted to expand vselect for SI, I got the following:
llc: 
/home/awatry/src/llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp:522:
llvm::SDValue llvm::DAGTypeLegalizer::PromoteIntRes_SETCC(llvm::SDNode*):
Assertion `SVT.isVector() == N-getOperand(0).getValueType().isVector() 
Vector compare must return a vector result!' failed.

Signed-off-by: Aaron Watry awa...@gmail.com
---
 test/CodeGen/R600/vselect.ll | 26 --
 1 file changed, 20 insertions(+), 6 deletions(-)

diff --git a/test/CodeGen/R600/vselect.ll b/test/CodeGen/R600/vselect.ll
index edd7ba0..3f08cec 100644
--- a/test/CodeGen/R600/vselect.ll
+++ b/test/CodeGen/R600/vselect.ll
@@ -1,10 +1,24 @@
-;RUN: llc  %s -march=r600 -mcpu=redwood | FileCheck %s
+;RUN: llc  %s -march=r600 -mcpu=redwood | FileCheck --check-prefix=EG-CHECK %s
 
-; CHECK: @test_select_v4i32
-; CHECK: CNDE_INT T{{[0-9]+\.[XYZW], PV\.[XYZW], T[0-9]+\.[XYZW], 
T[0-9]+\.[XYZW]}}
-; CHECK: CNDE_INT * T{{[0-9]+\.[XYZW], PV\.[XYZW], T[0-9]+\.[XYZW], 
T[0-9]+\.[XYZW]}}
-; CHECK: CNDE_INT T{{[0-9]+\.[XYZW], PV\.[XYZW], T[0-9]+\.[XYZW], 
T[0-9]+\.[XYZW]}}
-; CHECK: CNDE_INT * T{{[0-9]+\.[XYZW], PV\.[XYZW], T[0-9]+\.[XYZW], 
T[0-9]+\.[XYZW]}}
+;EG-CHECK: @test_select_v2i32
+;EG-CHECK: CNDE_INT {{\*? *}}T{{[0-9]+\.[XYZW], PV\.[XYZW], T[0-9]+\.[XYZW], 
T[0-9]+\.[XYZW]}}
+;EG-CHECK: CNDE_INT {{\*? *}}T{{[0-9]+\.[XYZW], PV\.[XYZW], T[0-9]+\.[XYZW], 
T[0-9]+\.[XYZW]}}
+
+define void @test_select_v2i32(2 x i32 addrspace(1)* %out, 2 x i32 
addrspace(1)* %in0, 2 x i32 addrspace(1)* %in1) {
+entry:
+  %0 = load 2 x i32 addrspace(1)* %in0
+  %1 = load 2 x i32 addrspace(1)* %in1
+  %cmp = icmp ne 2 x i32 %0, %1
+  %result = select 2 x i1 %cmp, 2 x i32 %0, 2 x i32 %1
+  store 2 x i32 %result, 2 x i32 addrspace(1)* %out
+  ret void
+}
+
+;EG-CHECK: @test_select_v4i32
+;EG-CHECK: CNDE_INT {{\*? *}}T{{[0-9]+\.[XYZW], PV\.[XYZW], T[0-9]+\.[XYZW], 
T[0-9]+\.[XYZW]}}
+;EG-CHECK: CNDE_INT {{\*? *}}T{{[0-9]+\.[XYZW], PV\.[XYZW], T[0-9]+\.[XYZW], 
T[0-9]+\.[XYZW]}}
+;EG-CHECK: CNDE_INT {{\*? *}}T{{[0-9]+\.[XYZW], PV\.[XYZW], T[0-9]+\.[XYZW], 
T[0-9]+\.[XYZW]}}
+;EG-CHECK: CNDE_INT {{\*? *}}T{{[0-9]+\.[XYZW], PV\.[XYZW], T[0-9]+\.[XYZW], 
T[0-9]+\.[XYZW]}}
 
 define void @test_select_v4i32(4 x i32 addrspace(1)* %out, 4 x i32 
addrspace(1)* %in0, 4 x i32 addrspace(1)* %in1) {
 entry:
-- 
1.8.1.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 12/12] R600: Consolidate expansion of v2i32/v4i32 ops for EG/SI

2013-06-20 Thread Aaron Watry
By default, we expand these operations for both EG and SI. Move the
duplicated code into a common space for now. If the targets ever actually
implement these operations as instructions, we can override that in the relevant
target.

Signed-off-by: Aaron Watry awa...@gmail.com
---
 lib/Target/R600/AMDGPUISelLowering.cpp | 22 ++
 lib/Target/R600/R600ISelLowering.cpp   | 19 ---
 lib/Target/R600/SIISelLowering.cpp | 30 --
 3 files changed, 22 insertions(+), 49 deletions(-)

diff --git a/lib/Target/R600/AMDGPUISelLowering.cpp 
b/lib/Target/R600/AMDGPUISelLowering.cpp
index 02d6fab..6d73590 100644
--- a/lib/Target/R600/AMDGPUISelLowering.cpp
+++ b/lib/Target/R600/AMDGPUISelLowering.cpp
@@ -70,6 +70,28 @@ AMDGPUTargetLowering::AMDGPUTargetLowering(TargetMachine 
TM) :
   setOperationAction(ISD::UDIV, MVT::i32, Expand);
   setOperationAction(ISD::UDIVREM, MVT::i32, Custom);
   setOperationAction(ISD::UREM, MVT::i32, Expand);
+
+  int types[] = {
+(int)MVT::v2i32,
+(int)MVT::v4i32
+  };
+  size_t NumTypes = sizeof(types) / sizeof(*types);
+
+  for (unsigned int x  = 0; x  NumTypes; ++x) {
+MVT::SimpleValueType VT = (MVT::SimpleValueType)types[x];
+//Expand the following operations for the current type by default
+setOperationAction(ISD::ADD,  VT, Expand);
+setOperationAction(ISD::AND,  VT, Expand);
+setOperationAction(ISD::MUL,  VT, Expand);
+setOperationAction(ISD::OR,   VT, Expand);
+setOperationAction(ISD::SHL,  VT, Expand);
+setOperationAction(ISD::SRL,  VT, Expand);
+setOperationAction(ISD::SRA,  VT, Expand);
+setOperationAction(ISD::SUB,  VT, Expand);
+setOperationAction(ISD::UDIV, VT, Expand);
+setOperationAction(ISD::UREM, VT, Expand);
+setOperationAction(ISD::XOR,  VT, Expand);
+  }
 }
 
 //===-===//
diff --git a/lib/Target/R600/R600ISelLowering.cpp 
b/lib/Target/R600/R600ISelLowering.cpp
index cf349a8..18e83e8 100644
--- a/lib/Target/R600/R600ISelLowering.cpp
+++ b/lib/Target/R600/R600ISelLowering.cpp
@@ -38,30 +38,11 @@ R600TargetLowering::R600TargetLowering(TargetMachine TM) :
   setOperationAction(ISD::FDIV, MVT::v4f32, Expand);
   setOperationAction(ISD::FSUB, MVT::v4f32, Expand);
 
-  setOperationAction(ISD::ADD,  MVT::v4i32, Expand);
-  setOperationAction(ISD::AND,  MVT::v4i32, Expand);
   setOperationAction(ISD::FP_TO_SINT, MVT::v4i32, Expand);
   setOperationAction(ISD::FP_TO_UINT, MVT::v4i32, Expand);
-  setOperationAction(ISD::MUL,  MVT::v2i32, Expand);
-  setOperationAction(ISD::MUL,  MVT::v4i32, Expand);
-  setOperationAction(ISD::OR, MVT::v4i32, Expand);
-  setOperationAction(ISD::OR, MVT::v2i32, Expand);
   setOperationAction(ISD::SINT_TO_FP, MVT::v4i32, Expand);
-  setOperationAction(ISD::SHL, MVT::v4i32, Expand);
-  setOperationAction(ISD::SHL, MVT::v2i32, Expand);
-  setOperationAction(ISD::SRL, MVT::v4i32, Expand);
-  setOperationAction(ISD::SRL, MVT::v2i32, Expand);
-  setOperationAction(ISD::SRA, MVT::v4i32, Expand);
-  setOperationAction(ISD::SRA, MVT::v2i32, Expand);
-  setOperationAction(ISD::SUB, MVT::v4i32, Expand);
-  setOperationAction(ISD::SUB, MVT::v2i32, Expand);
   setOperationAction(ISD::UINT_TO_FP, MVT::v4i32, Expand);
-  setOperationAction(ISD::UDIV, MVT::v2i32, Expand);
-  setOperationAction(ISD::UDIV, MVT::v4i32, Expand);
-  setOperationAction(ISD::UREM, MVT::v4i32, Expand);
   setOperationAction(ISD::SETCC, MVT::v4i32, Expand);
-  setOperationAction(ISD::XOR, MVT::v4i32, Expand);
-  setOperationAction(ISD::XOR, MVT::v2i32, Expand);
 
   setOperationAction(ISD::BR_CC, MVT::i32, Expand);
   setOperationAction(ISD::BR_CC, MVT::f32, Expand);
diff --git a/lib/Target/R600/SIISelLowering.cpp 
b/lib/Target/R600/SIISelLowering.cpp
index e70c7de..9d4cfef 100644
--- a/lib/Target/R600/SIISelLowering.cpp
+++ b/lib/Target/R600/SIISelLowering.cpp
@@ -65,36 +65,6 @@ SITargetLowering::SITargetLowering(TargetMachine TM) :
 
   setOperationAction(ISD::ADD, MVT::i64, Legal);
   setOperationAction(ISD::ADD, MVT::i32, Legal);
-  setOperationAction(ISD::ADD, MVT::v4i32, Expand);
-  setOperationAction(ISD::ADD, MVT::v2i32, Expand);
-
-  setOperationAction(ISD::AND, MVT::v2i32, Expand);
-  setOperationAction(ISD::AND, MVT::v4i32, Expand);
-
-  setOperationAction(ISD::MUL, MVT::v2i32, Expand);
-  setOperationAction(ISD::MUL, MVT::v4i32, Expand);
-
-  setOperationAction(ISD::OR, MVT::v2i32, Expand);
-  setOperationAction(ISD::OR, MVT::v4i32, Expand);
-
-  setOperationAction(ISD::SHL, MVT::v2i32, Expand);
-  setOperationAction(ISD::SHL, MVT::v4i32, Expand);
-  setOperationAction(ISD::SRL, MVT::v4i32, Expand);
-  setOperationAction(ISD::SRL, MVT::v2i32, Expand);
-  setOperationAction(ISD::SRA, MVT::v4i32, Expand);
-  setOperationAction(ISD::SRA, MVT::v2i32, Expand);
-
-  setOperationAction(ISD::SUB, MVT::v2i32, Expand);
-  setOperationAction(ISD::SUB, MVT::v4i32, Expand);
-
-  setOperationAction(ISD::UDIV

[Mesa-dev] [PATCH] R600: Improve vector constant loading for EG/SI

2013-06-21 Thread Aaron Watry
Add some constant load v2i32/v4i32 tests for both EG and SI.

Tested on: Pitcairn (7850) and Cedar (54xx)

Signed-off-by: Aaron Watry awa...@gmail.com
---
 lib/Target/R600/R600Instructions.td |  3 +++
 lib/Target/R600/SIInstructions.td   | 10 ++
 test/CodeGen/R600/load.vec.ll   | 27 +++
 3 files changed, 40 insertions(+)

diff --git a/lib/Target/R600/R600Instructions.td 
b/lib/Target/R600/R600Instructions.td
index 803f597..219358c 100644
--- a/lib/Target/R600/R600Instructions.td
+++ b/lib/Target/R600/R600Instructions.td
@@ -1421,6 +1421,9 @@ def CONSTANT_LOAD_eg : VTX_READ_32_eg 1,
   [(set i32:$dst_gpr, (constant_load ADDRVTX_READ:$src_gpr))]
 ;
 
+def CONSTANT_LOAD_128_eg : VTX_READ_128_eg 1,
+  [(set v4i32:$dst_gpr, (constant_load ADDRVTX_READ:$src_gpr))]
+;
 
 } // End Predicates = [isEG]
 
diff --git a/lib/Target/R600/SIInstructions.td 
b/lib/Target/R600/SIInstructions.td
index 9c96c08..0058c0d 100644
--- a/lib/Target/R600/SIInstructions.td
+++ b/lib/Target/R600/SIInstructions.td
@@ -1629,6 +1629,16 @@ multiclass MUBUFLoad_Pattern MUBUF Instr_ADDR64, 
ValueType vt,
   ;
 
   def : Pat 
+(vt (constant_ld (add i64:$ptr, (i64 IMM12bit:$offset,
+(Instr_ADDR64 (SI_ADDR64_RSRC (i64 0)), $ptr, (as_i16imm $offset))
+  ;
+
+  def : Pat 
+(vt (constant_ld i64:$ptr)),
+(Instr_ADDR64 (SI_ADDR64_RSRC (i64 0)), $ptr, 0)
+  ;
+
+  def : Pat 
  (vt (constant_ld (add i64:$ptr, i64:$offset))),
  (Instr_ADDR64 (SI_ADDR64_RSRC $ptr), $offset, 0)
   ;
diff --git a/test/CodeGen/R600/load.vec.ll b/test/CodeGen/R600/load.vec.ll
index da1149a..b450b47 100644
--- a/test/CodeGen/R600/load.vec.ll
+++ b/test/CodeGen/R600/load.vec.ll
@@ -23,3 +23,30 @@ define void @load_v4i32(4 x i32 addrspace(1)* %out, 4 x 
i32 addrspace(1)* %i
   store 4 x i32 %a, 4 x i32 addrspace(1)* %out
   ret void
 }
+
+; Load a v2i32 value from the constant address space.
+; EG-CHECK: @load_const_addrspace_v2i32
+; EG-CHECK: VTX_READ_32 T{{[0-9]+}}.X, T{{[0-9]+}}.X, 4
+; EG-CHECK: VTX_READ_32 T{{[0-9]+}}.X, T{{[0-9]+}}.X, 0
+; SI-CHECK: @load_const_addrspace_v2i32
+; SI-CHECK: BUFFER_LOAD_DWORDX2 VGPR{{[0-9]+}}
+
+define void @load_const_addrspace_v2i32(2 x i32 addrspace(1)* %out, 2 x 
i32 addrspace(2)* %in) {
+entry:
+  %0 = load 2 x i32 addrspace(2)* %in
+  store 2 x i32 %0, 2 x i32 addrspace(1)* %out
+  ret void
+}
+
+; Load a v4i32 value from the constant address space.
+; EG-CHECK: @load_const_addrspace_v4i32
+; EG-CHECK: VTX_READ_128 T{{[0-9]+}}.XYZW, T{{[0-9]+}}.X, 0
+; SI-CHECK: @load_const_addrspace_v4i32
+; SI-CHECK: BUFFER_LOAD_DWORDX4 VGPR{{[0-9]+}}
+
+define void @load_const_addrspace_v4i32(4 x i32 addrspace(1)* %out, 4 x 
i32 addrspace(2)* %in) {
+entry:
+  %0 = load 4 x i32 addrspace(2)* %in
+  store 4 x i32 %0, 4 x i32 addrspace(1)* %out
+  ret void
+}
-- 
1.8.1.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 04/12] R600/SI: Expand shl of v2i32/v4i32 for SI

2013-06-21 Thread Aaron Watry
I moved it to the top of the file, if that's ok...  although I guess I
could leave it at the bottom if you want..

--Aaron

On Fri, Jun 21, 2013 at 9:05 PM, Tom Stellard t...@stellard.net wrote:
 On Thu, Jun 20, 2013 at 06:43:42PM -0500, Aaron Watry wrote:
 Also add lit test for both cases on SI, and v2i32 for evergreen.

 Signed-off-by: Aaron Watry awa...@gmail.com
 ---
  lib/Target/R600/SIISelLowering.cpp |  3 +++
  test/CodeGen/R600/shl.ll   | 47 
 ++
  2 files changed, 40 insertions(+), 10 deletions(-)

 diff --git a/lib/Target/R600/SIISelLowering.cpp 
 b/lib/Target/R600/SIISelLowering.cpp
 index 30a7de5..515c7a4 100644
 --- a/lib/Target/R600/SIISelLowering.cpp
 +++ b/lib/Target/R600/SIISelLowering.cpp
 @@ -77,6 +77,9 @@ SITargetLowering::SITargetLowering(TargetMachine TM) :
setOperationAction(ISD::OR, MVT::v2i32, Expand);
setOperationAction(ISD::OR, MVT::v4i32, Expand);

 +  setOperationAction(ISD::SHL, MVT::v2i32, Expand);
 +  setOperationAction(ISD::SHL, MVT::v4i32, Expand);
 +
setOperationAction(ISD::SUB, MVT::v2i32, Expand);
setOperationAction(ISD::SUB, MVT::v4i32, Expand);

 diff --git a/test/CodeGen/R600/shl.ll b/test/CodeGen/R600/shl.ll
 index db970e9..d68730a 100644
 --- a/test/CodeGen/R600/shl.ll
 +++ b/test/CodeGen/R600/shl.ll
 @@ -1,16 +1,43 @@
 -; RUN: llc  %s -march=r600 -mcpu=redwood | FileCheck %s
 +;RUN: llc  %s -march=r600 -mcpu=redwood | FileCheck 
 --check-prefix=EG-CHECK %s
 +;RUN: llc  %s -march=r600 -mcpu=verde | FileCheck --check-prefix=SI-CHECK 
 %s

 -; CHECK: @shl_v4i32
 -; CHECK: LSHL * T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
 -; CHECK: LSHL * T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
 -; CHECK: LSHL * T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
 -; CHECK: LSHL * T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
 +; XXX: Add SI test for i64 shl once i64 stores and i64 function arguments 
 are
 +; supported.
 +
 +;EG-CHECK: @shl_v2i32
 +;EG-CHECK: LSHL {{\*? *}}T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], 
 T[0-9]+\.[XYZW]}}
 +;EG-CHECK: LSHL {{\*? *}}T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], 
 T[0-9]+\.[XYZW]}}
 +
 +;SI-CHECK: @shl_v2i32
 +;SI-CHECK: V_LSHL_B32_e32 VGPR{{[0-9]+, VGPR[0-9]+, VGPR[0-9]+}}
 +;SI-CHECK: V_LSHL_B32_e32 VGPR{{[0-9]+, VGPR[0-9]+, VGPR[0-9]+}}
 +
 +define void @shl_v2i32(2 x i32 addrspace(1)* %out, 2 x i32 
 addrspace(1)* %in) {
 +  %b_ptr = getelementptr 2 x i32 addrspace(1)* %in, i32 1
 +  %a = load 2 x i32 addrspace(1) * %in
 +  %b = load 2 x i32 addrspace(1) * %b_ptr
 +  %result = shl 2 x i32 %a, %b
 +  store 2 x i32 %result, 2 x i32 addrspace(1)* %out
 +  ret void
 +}
 +
 +;EG-CHECK: @shl_v4i32
 +;EG-CHECK: LSHL {{\*? *}}T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], 
 T[0-9]+\.[XYZW]}}
 +;EG-CHECK: LSHL {{\*? *}}T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], 
 T[0-9]+\.[XYZW]}}
 +;EG-CHECK: LSHL {{\*? *}}T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], 
 T[0-9]+\.[XYZW]}}
 +;EG-CHECK: LSHL {{\*? *}}T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], 
 T[0-9]+\.[XYZW]}}
 +
 +;SI-CHECK: @shl_v4i32
 +;SI-CHECK: V_LSHL_B32_e32 VGPR{{[0-9]+, VGPR[0-9]+, VGPR[0-9]+}}
 +;SI-CHECK: V_LSHL_B32_e32 VGPR{{[0-9]+, VGPR[0-9]+, VGPR[0-9]+}}
 +;SI-CHECK: V_LSHL_B32_e32 VGPR{{[0-9]+, VGPR[0-9]+, VGPR[0-9]+}}
 +;SI-CHECK: V_LSHL_B32_e32 VGPR{{[0-9]+, VGPR[0-9]+, VGPR[0-9]+}}

 -define void @shl_v4i32(4 x i32 addrspace(1)* %out, 4 x i32 %a, 4 x 
 i32 %b) {
 +define void @shl_v4i32(4 x i32 addrspace(1)* %out, 4 x i32 
 addrspace(1)* %in) {
 +  %b_ptr = getelementptr 4 x i32 addrspace(1)* %in, i32 1
 +  %a = load 4 x i32 addrspace(1) * %in
 +  %b = load 4 x i32 addrspace(1) * %b_ptr
%result = shl 4 x i32 %a, %b
store 4 x i32 %result, 4 x i32 addrspace(1)* %out
ret void
  }
 -
 -; XXX: Add SI test for i64 shl once i64 stores and i64 function arguments 
 are
 -; supported.

 We should leave this comment here.

 -Tom
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/2] R600: Add support for i32 loads from the constant address space on Cayman

2013-06-24 Thread Aaron Watry
Tested-By: Aaron Watry awa...@gmail.com

Tested on an A6-3500 (SUMO)

On Tue, Jun 18, 2013 at 11:54 AM, Tom Stellard t...@stellard.net wrote:
 From: Tom Stellard thomas.stell...@amd.com

 ---
  lib/Target/R600/R600Instructions.td | 9 +
  test/CodeGen/R600/load.ll   | 1 +
  2 files changed, 10 insertions(+)

 diff --git a/lib/Target/R600/R600Instructions.td 
 b/lib/Target/R600/R600Instructions.td
 index 83d735f..803f597 100644
 --- a/lib/Target/R600/R600Instructions.td
 +++ b/lib/Target/R600/R600Instructions.td
 @@ -1755,6 +1755,15 @@ def VTX_READ_GLOBAL_128_cm : VTX_READ_128_cm 1,
[(set v4i32:$dst_gpr, (global_load ADDRVTX_READ:$src_gpr))]
  ;

 +//===--===//
 +// Constant Loads
 +// XXX: We are currently storing all constants in the global address space.
 +//===--===//
 +
 +def CONSTANT_LOAD_cm : VTX_READ_32_cm 1,
 +  [(set i32:$dst_gpr, (constant_load ADDRVTX_READ:$src_gpr))]
 +;
 +
  } // End isCayman

  
 //===--===//
 diff --git a/test/CodeGen/R600/load.ll b/test/CodeGen/R600/load.ll
 index ff774ec..d1ebaa3 100644
 --- a/test/CodeGen/R600/load.ll
 +++ b/test/CodeGen/R600/load.ll
 @@ -1,4 +1,5 @@
  ; RUN: llc  %s -march=r600 -mcpu=redwood | FileCheck 
 --check-prefix=R600-CHECK %s
 +; RUN: llc  %s -march=r600 -mcpu=cayman | FileCheck 
 --check-prefix=R600-CHECK %s
  ; RUN: llc  %s -march=r600 -mcpu=SI | FileCheck --check-prefix=SI-CHECK  %s

  ; Load an i8 value from the global address space.
 --
 1.7.11.4

 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] clover: Fix build with LLVM 3.4

2013-06-28 Thread Aaron Watry
PathV1.h has been removed. In theory this can go back before llvm 3.4, but I
haven't done the research to find out how far back.

Signed-off-by: Aaron Watry awa...@gmail.com
---
 src/gallium/state_trackers/clover/llvm/invocation.cpp | 12 
 1 file changed, 12 insertions(+)

diff --git a/src/gallium/state_trackers/clover/llvm/invocation.cpp 
b/src/gallium/state_trackers/clover/llvm/invocation.cpp
index 362f02f..ee0249d 100644
--- a/src/gallium/state_trackers/clover/llvm/invocation.cpp
+++ b/src/gallium/state_trackers/clover/llvm/invocation.cpp
@@ -43,7 +43,12 @@
 #include llvm/PassManager.h
 #include llvm/Support/TargetSelect.h
 #include llvm/Support/MemoryBuffer.h
+#if HAVE_LLVM  0x0304
 #include llvm/Support/PathV1.h
+#else
+#include llvm/ADT/SmallString.h
+#include llvm/Support/Path.h
+#endif
 #include llvm/Transforms/IPO.h
 #include llvm/Transforms/IPO/PassManagerBuilder.h
 
@@ -222,9 +227,16 @@ namespace {
 
   llvm::PassManager PM;
   llvm::PassManagerBuilder Builder;
+#if HAVE_LLVM  0x0304
   llvm::sys::Path libclc_path =
 llvm::sys::Path(LIBCLC_LIBEXECDIR + processor +
- + triple + .bc);
+#else
+  llvm::SmallString1 libclc_path;
+  libclc_path = LIBCLC_LIBEXECDIR;
+  std::string file_name = processor + - + triple + .bc;
+  llvm::sys::path::append(libclc_path, file_name);
+#endif
 
   // Link the kernel with libclc
 #if HAVE_LLVM  0x0303
-- 
1.8.1.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] clover: Fix build with LLVM 3.4

2013-06-28 Thread Aaron Watry
Disregard this patch... Looks like Tom already pushed a fix last night.

--Aaron

On Fri, Jun 28, 2013 at 9:41 AM, Aaron Watry awa...@gmail.com wrote:
 PathV1.h has been removed. In theory this can go back before llvm 3.4, but I
 haven't done the research to find out how far back.

 Signed-off-by: Aaron Watry awa...@gmail.com
 ---
  src/gallium/state_trackers/clover/llvm/invocation.cpp | 12 
  1 file changed, 12 insertions(+)

 diff --git a/src/gallium/state_trackers/clover/llvm/invocation.cpp 
 b/src/gallium/state_trackers/clover/llvm/invocation.cpp
 index 362f02f..ee0249d 100644
 --- a/src/gallium/state_trackers/clover/llvm/invocation.cpp
 +++ b/src/gallium/state_trackers/clover/llvm/invocation.cpp
 @@ -43,7 +43,12 @@
  #include llvm/PassManager.h
  #include llvm/Support/TargetSelect.h
  #include llvm/Support/MemoryBuffer.h
 +#if HAVE_LLVM  0x0304
  #include llvm/Support/PathV1.h
 +#else
 +#include llvm/ADT/SmallString.h
 +#include llvm/Support/Path.h
 +#endif
  #include llvm/Transforms/IPO.h
  #include llvm/Transforms/IPO/PassManagerBuilder.h

 @@ -222,9 +227,16 @@ namespace {

llvm::PassManager PM;
llvm::PassManagerBuilder Builder;
 +#if HAVE_LLVM  0x0304
llvm::sys::Path libclc_path =
  llvm::sys::Path(LIBCLC_LIBEXECDIR + processor +
 - + triple + .bc);
 +#else
 +  llvm::SmallString1 libclc_path;
 +  libclc_path = LIBCLC_LIBEXECDIR;
 +  std::string file_name = processor + - + triple + .bc;
 +  llvm::sys::path::append(libclc_path, file_name);
 +#endif

// Link the kernel with libclc
  #if HAVE_LLVM  0x0303
 --
 1.8.1.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] R600: Expand VSELECT for all types

2013-07-16 Thread Aaron Watry
Looks good to me.

I've tested on Cedar (HD5400) with no OpenCL regressions, but cannot
test on SI because SETCC still causes issues (see
https://bugs.freedesktop.org/show_bug.cgi?id=66175).  Once SETCC is
fixed for SI, we should probably add SI-CHECK lines to vselect.ll

--Aaron

On Tue, Jul 16, 2013 at 2:15 PM, Tom Stellard t...@stellard.net wrote:
 From: Tom Stellard thomas.stell...@amd.com

 ---
  lib/Target/R600/AMDGPUISelLowering.cpp |  3 +++
  lib/Target/R600/R600ISelLowering.cpp   |  3 ---
  test/CodeGen/R600/vselect.ll   | 30 ++
  3 files changed, 33 insertions(+), 3 deletions(-)

 diff --git a/lib/Target/R600/AMDGPUISelLowering.cpp 
 b/lib/Target/R600/AMDGPUISelLowering.cpp
 index 9891ad3..e93ddc4 100644
 --- a/lib/Target/R600/AMDGPUISelLowering.cpp
 +++ b/lib/Target/R600/AMDGPUISelLowering.cpp
 @@ -77,6 +77,8 @@ AMDGPUTargetLowering::AMDGPUTargetLowering(TargetMachine 
 TM) :
setOperationAction(ISD::UDIV, MVT::i32, Expand);
setOperationAction(ISD::UDIVREM, MVT::i32, Custom);
setOperationAction(ISD::UREM, MVT::i32, Expand);
 +  setOperationAction(ISD::VSELECT, MVT::v2f32, Expand);
 +  setOperationAction(ISD::VSELECT, MVT::v4f32, Expand);

int types[] = {
  (int)MVT::v2i32,
 @@ -97,6 +99,7 @@ AMDGPUTargetLowering::AMDGPUTargetLowering(TargetMachine 
 TM) :
  setOperationAction(ISD::SUB,  VT, Expand);
  setOperationAction(ISD::UDIV, VT, Expand);
  setOperationAction(ISD::UREM, VT, Expand);
 +setOperationAction(ISD::VSELECT, VT, Expand);
  setOperationAction(ISD::XOR,  VT, Expand);
}
  }
 diff --git a/lib/Target/R600/R600ISelLowering.cpp 
 b/lib/Target/R600/R600ISelLowering.cpp
 index 7aef08a..1067b38 100644
 --- a/lib/Target/R600/R600ISelLowering.cpp
 +++ b/lib/Target/R600/R600ISelLowering.cpp
 @@ -67,9 +67,6 @@ R600TargetLowering::R600TargetLowering(TargetMachine TM) :
setOperationAction(ISD::SELECT, MVT::i32, Custom);
setOperationAction(ISD::SELECT, MVT::f32, Custom);

 -  setOperationAction(ISD::VSELECT, MVT::v4i32, Expand);
 -  setOperationAction(ISD::VSELECT, MVT::v2i32, Expand);
 -
// Legalize loads and stores to the private address space.
setOperationAction(ISD::LOAD, MVT::i32, Custom);
setOperationAction(ISD::LOAD, MVT::v2i32, Expand);
 diff --git a/test/CodeGen/R600/vselect.ll b/test/CodeGen/R600/vselect.ll
 index 3f08cec..79d896b 100644
 --- a/test/CodeGen/R600/vselect.ll
 +++ b/test/CodeGen/R600/vselect.ll
 @@ -14,6 +14,20 @@ entry:
ret void
  }

 +;EG-CHECK: @test_select_v2f32
 +;EG-CHECK: CNDE_INT {{\*? *}}T{{[0-9]+\.[XYZW], PV\.[XYZW], T[0-9]+\.[XYZW], 
 T[0-9]+\.[XYZW]}}
 +;EG-CHECK: CNDE_INT {{\*? *}}T{{[0-9]+\.[XYZW], PV\.[XYZW], T[0-9]+\.[XYZW], 
 T[0-9]+\.[XYZW]}}
 +
 +define void @test_select_v2f32(2 x float addrspace(1)* %out, 2 x float 
 addrspace(1)* %in0, 2 x float addrspace(1)* %in1) {
 +entry:
 +  %0 = load 2 x float addrspace(1)* %in0
 +  %1 = load 2 x float addrspace(1)* %in1
 +  %cmp = fcmp one 2 x float %0, %1
 +  %result = select 2 x i1 %cmp, 2 x float %0, 2 x float %1
 +  store 2 x float %result, 2 x float addrspace(1)* %out
 +  ret void
 +}
 +
  ;EG-CHECK: @test_select_v4i32
  ;EG-CHECK: CNDE_INT {{\*? *}}T{{[0-9]+\.[XYZW], PV\.[XYZW], T[0-9]+\.[XYZW], 
 T[0-9]+\.[XYZW]}}
  ;EG-CHECK: CNDE_INT {{\*? *}}T{{[0-9]+\.[XYZW], PV\.[XYZW], T[0-9]+\.[XYZW], 
 T[0-9]+\.[XYZW]}}
 @@ -29,3 +43,19 @@ entry:
store 4 x i32 %result, 4 x i32 addrspace(1)* %out
ret void
  }
 +
 +;EG-CHECK: @test_select_v4f32
 +;EG-CHECK: CNDE_INT {{\*? *}}T{{[0-9]+\.[XYZW], PV\.[XYZW], T[0-9]+\.[XYZW], 
 T[0-9]+\.[XYZW]}}
 +;EG-CHECK: CNDE_INT {{\*? *}}T{{[0-9]+\.[XYZW], PV\.[XYZW], T[0-9]+\.[XYZW], 
 T[0-9]+\.[XYZW]}}
 +;EG-CHECK: CNDE_INT {{\*? *}}T{{[0-9]+\.[XYZW], PV\.[XYZW], T[0-9]+\.[XYZW], 
 T[0-9]+\.[XYZW]}}
 +;EG-CHECK: CNDE_INT {{\*? *}}T{{[0-9]+\.[XYZW], PV\.[XYZW], T[0-9]+\.[XYZW], 
 T[0-9]+\.[XYZW]}}
 +
 +define void @test_select_v4f32(4 x float addrspace(1)* %out, 4 x float 
 addrspace(1)* %in0, 4 x float addrspace(1)* %in1) {
 +entry:
 +  %0 = load 4 x float addrspace(1)* %in0
 +  %1 = load 4 x float addrspace(1)* %in1
 +  %cmp = fcmp one 4 x float %0, %1
 +  %result = select 4 x i1 %cmp, 4 x float %0, 4 x float %1
 +  store 4 x float %result, 4 x float addrspace(1)* %out
 +  ret void
 +}
 --
 1.7.11.4

 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] R600: Expand VSELECT for all types

2013-07-17 Thread Aaron Watry
Hi Tom,

I have verified that these patches, along with the previous one fix
the errors that I was getting for SI.  The test case from that FD.o
bug still fails, but that's due to attempting to sign extend the
v2i1/v4i1 result to v2i32/v4i32, which isn't necessary when doing
vselect properly.

I've successfully run an int/int2/int4 min/max/clamp CL builtin test
on my Pitcairn, so as far as I can tell, this is working correctly
now.

If we're concerned about the sign extension of boolean to 32-bit ints,
I'd say that we should open a new bug for that.

--Aaron

On Tue, Jul 16, 2013 at 8:39 PM, Tom Stellard t...@stellard.net wrote:
 Hi,

 The attached three patches along with this one should fix VSELECT on SI
 as well.

 -Tom

 On Tue, Jul 16, 2013 at 05:12:40PM -0500, Aaron Watry wrote:
 Looks good to me.

 I've tested on Cedar (HD5400) with no OpenCL regressions, but cannot
 test on SI because SETCC still causes issues (see
 https://bugs.freedesktop.org/show_bug.cgi?id=66175).  Once SETCC is
 fixed for SI, we should probably add SI-CHECK lines to vselect.ll

 --Aaron

 On Tue, Jul 16, 2013 at 2:15 PM, Tom Stellard t...@stellard.net wrote:
  From: Tom Stellard thomas.stell...@amd.com
 
  ---
   lib/Target/R600/AMDGPUISelLowering.cpp |  3 +++
   lib/Target/R600/R600ISelLowering.cpp   |  3 ---
   test/CodeGen/R600/vselect.ll   | 30 ++
   3 files changed, 33 insertions(+), 3 deletions(-)
 
  diff --git a/lib/Target/R600/AMDGPUISelLowering.cpp 
  b/lib/Target/R600/AMDGPUISelLowering.cpp
  index 9891ad3..e93ddc4 100644
  --- a/lib/Target/R600/AMDGPUISelLowering.cpp
  +++ b/lib/Target/R600/AMDGPUISelLowering.cpp
  @@ -77,6 +77,8 @@ AMDGPUTargetLowering::AMDGPUTargetLowering(TargetMachine 
  TM) :
 setOperationAction(ISD::UDIV, MVT::i32, Expand);
 setOperationAction(ISD::UDIVREM, MVT::i32, Custom);
 setOperationAction(ISD::UREM, MVT::i32, Expand);
  +  setOperationAction(ISD::VSELECT, MVT::v2f32, Expand);
  +  setOperationAction(ISD::VSELECT, MVT::v4f32, Expand);
 
 int types[] = {
   (int)MVT::v2i32,
  @@ -97,6 +99,7 @@ AMDGPUTargetLowering::AMDGPUTargetLowering(TargetMachine 
  TM) :
   setOperationAction(ISD::SUB,  VT, Expand);
   setOperationAction(ISD::UDIV, VT, Expand);
   setOperationAction(ISD::UREM, VT, Expand);
  +setOperationAction(ISD::VSELECT, VT, Expand);
   setOperationAction(ISD::XOR,  VT, Expand);
 }
   }
  diff --git a/lib/Target/R600/R600ISelLowering.cpp 
  b/lib/Target/R600/R600ISelLowering.cpp
  index 7aef08a..1067b38 100644
  --- a/lib/Target/R600/R600ISelLowering.cpp
  +++ b/lib/Target/R600/R600ISelLowering.cpp
  @@ -67,9 +67,6 @@ R600TargetLowering::R600TargetLowering(TargetMachine 
  TM) :
 setOperationAction(ISD::SELECT, MVT::i32, Custom);
 setOperationAction(ISD::SELECT, MVT::f32, Custom);
 
  -  setOperationAction(ISD::VSELECT, MVT::v4i32, Expand);
  -  setOperationAction(ISD::VSELECT, MVT::v2i32, Expand);
  -
 // Legalize loads and stores to the private address space.
 setOperationAction(ISD::LOAD, MVT::i32, Custom);
 setOperationAction(ISD::LOAD, MVT::v2i32, Expand);
  diff --git a/test/CodeGen/R600/vselect.ll b/test/CodeGen/R600/vselect.ll
  index 3f08cec..79d896b 100644
  --- a/test/CodeGen/R600/vselect.ll
  +++ b/test/CodeGen/R600/vselect.ll
  @@ -14,6 +14,20 @@ entry:
 ret void
   }
 
  +;EG-CHECK: @test_select_v2f32
  +;EG-CHECK: CNDE_INT {{\*? *}}T{{[0-9]+\.[XYZW], PV\.[XYZW], 
  T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
  +;EG-CHECK: CNDE_INT {{\*? *}}T{{[0-9]+\.[XYZW], PV\.[XYZW], 
  T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
  +
  +define void @test_select_v2f32(2 x float addrspace(1)* %out, 2 x 
  float addrspace(1)* %in0, 2 x float addrspace(1)* %in1) {
  +entry:
  +  %0 = load 2 x float addrspace(1)* %in0
  +  %1 = load 2 x float addrspace(1)* %in1
  +  %cmp = fcmp one 2 x float %0, %1
  +  %result = select 2 x i1 %cmp, 2 x float %0, 2 x float %1
  +  store 2 x float %result, 2 x float addrspace(1)* %out
  +  ret void
  +}
  +
   ;EG-CHECK: @test_select_v4i32
   ;EG-CHECK: CNDE_INT {{\*? *}}T{{[0-9]+\.[XYZW], PV\.[XYZW], 
  T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
   ;EG-CHECK: CNDE_INT {{\*? *}}T{{[0-9]+\.[XYZW], PV\.[XYZW], 
  T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
  @@ -29,3 +43,19 @@ entry:
 store 4 x i32 %result, 4 x i32 addrspace(1)* %out
 ret void
   }
  +
  +;EG-CHECK: @test_select_v4f32
  +;EG-CHECK: CNDE_INT {{\*? *}}T{{[0-9]+\.[XYZW], PV\.[XYZW], 
  T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
  +;EG-CHECK: CNDE_INT {{\*? *}}T{{[0-9]+\.[XYZW], PV\.[XYZW], 
  T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
  +;EG-CHECK: CNDE_INT {{\*? *}}T{{[0-9]+\.[XYZW], PV\.[XYZW], 
  T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
  +;EG-CHECK: CNDE_INT {{\*? *}}T{{[0-9]+\.[XYZW], PV\.[XYZW], 
  T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
  +
  +define void @test_select_v4f32(4 x float addrspace(1)* %out, 4 x 
  float addrspace(1)* %in0, 4 x float addrspace(1)* %in1) {
  +entry:
  +  %0 = load 4 x float addrspace(1)* %in0
  +  %1

Re: [Mesa-dev] [PATCH 7/7] clover: Sign-extend and zero-extend kernel arguments when required v2

2013-07-17 Thread Aaron Watry
On Tue, Jul 9, 2013 at 11:21 PM, Tom Stellard t...@stellard.net wrote:
 From: Tom Stellard thomas.stell...@amd.com

 v2:
   - Extend to target size rather than aligned size
   - Support for big-endian
 ---
  src/gallium/state_trackers/clover/core/kernel.cpp  | 58 
 --
  src/gallium/state_trackers/clover/core/kernel.hpp  | 17 ---
  src/gallium/state_trackers/clover/core/module.hpp  | 27 --
  .../state_trackers/clover/llvm/invocation.cpp  | 20 +++-
  4 files changed, 95 insertions(+), 27 deletions(-)

 diff --git a/src/gallium/state_trackers/clover/core/kernel.cpp 
 b/src/gallium/state_trackers/clover/core/kernel.cpp
 index 2b6fbe5..99723ff 100644
 --- a/src/gallium/state_trackers/clover/core/kernel.cpp
 +++ b/src/gallium/state_trackers/clover/core/kernel.cpp
 @@ -152,10 +152,12 @@ _cl_kernel::exec_context::bind(clover::command_queue 
 *__q) {
 for (const clover::module::argument arg : kern.module_args(*q)) {
if (arg.type == clover::module::argument::pad) {
   input.resize(input.size() + arg.size);
 -  } else {
 - assert(arg.arg_index = 0);
 - kern.args[arg.arg_index]-bind(*this);
 + break;

My C++ is a bit weak, but this feels like it should be a continue
instead of a break.  Not positive of the complete context this is
being called within, so feel free to tell me to go hide under my rock.
 If this is supposed to bind the full list of kernel arguments and the
first argument requires padding/extension, I don't see how the
following arguments would get bound.

--Aaron

}
 +
 +  assert(arg.arg_index = 0);
 +
 +  kern.args[arg.arg_index]-bind(*this, arg);
 }

 // Create a new compute state if anything changed.
 @@ -202,9 +204,7 @@ _cl_kernel::argument::storage() const {
 return 0;
  }

 -_cl_kernel::scalar_argument::scalar_argument(size_t size) :
 -   argument(size) {
 -}
 +_cl_kernel::scalar_argument::scalar_argument(size_t size) : argument(size) { 
 }

  void
  _cl_kernel::scalar_argument::set(size_t size, const void *value) {
 @@ -216,8 +216,32 @@ _cl_kernel::scalar_argument::set(size_t size, const void 
 *value) {
  }

  void
 -_cl_kernel::scalar_argument::bind(exec_context ctx) {
 -   ctx.input.insert(ctx.input.end(), v.begin(), v.end());
 +_cl_kernel::scalar_argument::bind(exec_context ctx,
 +  const clover::module::argument arg) {
 +   // Extend the value
 +   bool little_endian = ctx.q-dev.endianness() == PIPE_ENDIAN_LITTLE;
 +   bool has_sign;
 +   if (little_endian) {
 +  has_sign = v[__size - 1]  0x80;
 +   } else {
 +  has_sign = v[0]  0x80;
 +   }
 +   uint8_t ext_value;
 +   if (arg.ext_type == module::argument::sext  has_sign) {
 +  ext_value = 0xff;
 +   } else {
 +  ext_value = 0;
 +   }
 +
 +   if (little_endian) {
 +  ctx.input.insert(ctx.input.end(), v.begin(), v.end());
 +   }
 +   for (unsigned i = __size; i  arg.target_size; ++i) {
 +  ctx.input.push_back(ext_value);
 +   }
 +   if (!little_endian) {
 +  ctx.input.insert(ctx.input.end(), v.begin(), v.end());
 +   }
  }

  void
 @@ -241,7 +265,8 @@ _cl_kernel::global_argument::set(size_t size, const void 
 *value) {
  }

  void
 -_cl_kernel::global_argument::bind(exec_context ctx) {
 +_cl_kernel::global_argument::bind(exec_context ctx,
 +  const clover::module::argument arg) {
 size_t offset = ctx.input.size();
 size_t idx = ctx.g_buffers.size();

 @@ -277,7 +302,8 @@ _cl_kernel::local_argument::set(size_t size, const void 
 *value) {
  }

  void
 -_cl_kernel::local_argument::bind(exec_context ctx) {
 +_cl_kernel::local_argument::bind(exec_context ctx,
 +  const clover::module::argument arg) {
 size_t offset = ctx.input.size();
 size_t ptr = ctx.mem_local;

 @@ -308,7 +334,8 @@ _cl_kernel::constant_argument::set(size_t size, const 
 void *value) {
  }

  void
 -_cl_kernel::constant_argument::bind(exec_context ctx) {
 +_cl_kernel::constant_argument::bind(exec_context ctx,
 +  const clover::module::argument arg) {
 size_t offset = ctx.input.size();
 size_t idx = ctx.resources.size();

 @@ -341,7 +368,8 @@ _cl_kernel::image_rd_argument::set(size_t size, const 
 void *value) {
  }

  void
 -_cl_kernel::image_rd_argument::bind(exec_context ctx) {
 +_cl_kernel::image_rd_argument::bind(exec_context ctx,
 +  const clover::module::argument arg) {
 size_t offset = ctx.input.size();
 size_t idx = ctx.sviews.size();

 @@ -374,7 +402,8 @@ _cl_kernel::image_wr_argument::set(size_t size, const 
 void *value) {
  }

  void
 -_cl_kernel::image_wr_argument::bind(exec_context ctx) {
 +_cl_kernel::image_wr_argument::bind(exec_context ctx,
 +const clover::module::argument arg) {
 size_t offset = ctx.input.size();
 size_t idx = ctx.resources.size();

 @@ -404,7 +433,8 @@ 

Re: [Mesa-dev] Patches: R600: Improve load / store support for 8-bit and 16-bit types

2013-08-12 Thread Aaron Watry
It'll take me a while to attempt to parse everything that's going on
in these patches (and your resource descriptor types series that this
depends on), but I have sent it all through a piglit run on Evergreen
(Cedar).  Everything was latest Mesa/LLVM/libclc upstream code as of
today.

Baseline: 567/855 tests passed
Descriptors Series: 575/855 tests passed -- Main differences here
were with some int3 load/store issues which were just exposed recently
and fixed by this series)
Descriptors + char/short load/store series: 880/1119 tests passed
(most of the additional tests and passes were char/short tests that no
longer crash out).

Specifically, I've double-checked the char/short/uchar/ushort built-in
functions, as well as the char/short arithmetic tests, and things are
looking good so far.  I'll try to test on Cayman/SI later.

--Aaron

On Mon, Aug 12, 2013 at 2:56 PM, Tom Stellard t...@stellard.net wrote:
 Hi,

 The attached patches improve support for i8 and i16 loads and stores for
 Evergreen and newer GPUs.  This means that byte-addressable stores are
 now supported.

 Please review/test.

 -Tom

 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Patches: R600: Improve load / store support for 8-bit and 16-bit types

2013-08-13 Thread Aaron Watry
I've finished running comparison tests on Cayman/Pitcairn.  The
descriptors series and 8/16-bit load/store support series both look
like they're in good condition for the cards I was able to test: Cedar
(5400), A6-3500 Llano and Pitcairn (7850).  No regressions spotted,
just improvements.

And as you said, the descriptors series fixed compute hangs for the
7850 on quite a few kernels which did comparison operations (max/clamp
kernels mostly, maybe some min).

You can definitely get a tested-by for both the descriptors series and this:
Tested-by: Aaron Watry awa...@gmail.com

Quite a few of the tablegen changes are still a bit above my head, so
I don't feel qualified to give a comprehensive review on that.

--Aaron

On Mon, Aug 12, 2013 at 6:00 PM, Aaron Watry awa...@gmail.com wrote:
 It'll take me a while to attempt to parse everything that's going on
 in these patches (and your resource descriptor types series that this
 depends on), but I have sent it all through a piglit run on Evergreen
 (Cedar).  Everything was latest Mesa/LLVM/libclc upstream code as of
 today.

 Baseline: 567/855 tests passed
 Descriptors Series: 575/855 tests passed -- Main differences here
 were with some int3 load/store issues which were just exposed recently
 and fixed by this series)
 Descriptors + char/short load/store series: 880/1119 tests passed
 (most of the additional tests and passes were char/short tests that no
 longer crash out).

 Specifically, I've double-checked the char/short/uchar/ushort built-in
 functions, as well as the char/short arithmetic tests, and things are
 looking good so far.  I'll try to test on Cayman/SI later.

 --Aaron

 On Mon, Aug 12, 2013 at 2:56 PM, Tom Stellard t...@stellard.net wrote:
 Hi,

 The attached patches improve support for i8 and i16 loads and stores for
 Evergreen and newer GPUs.  This means that byte-addressable stores are
 now supported.

 Please review/test.

 -Tom

 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] R600: Fix segfault in R600TextureIntrinsicReplacer

2013-08-21 Thread Aaron Watry
I'm not sure about the lit test, but this definitely made it much more
obvious to me what was failing in my VP8 decoder on R600 in OpenCL.

Old:
Stack dump:
0.Running pass 'Function Pass Manager' on module 'radeon'.
1.Running pass 'R600 Texture Intrinsics Replacer' on function
'@vp8_loop_filter_all_edges_kernel'
Segmentation fault (core dumped)

New:
0x3eca3b0: i32 = GlobalAddressi32 (...)* @write_mem_fence 0 [ORD=70]
Undefined function
UNREACHABLE executed at
/home/awatry/src/llvm/lib/Target/R600/AMDGPUISelLowering.h:76!
Stack dump:
0.Running pass 'Function Pass Manager' on module 'radeon'.
1.Running pass 'AMDGPU DAG-DAG Pattern Instruction Selection' on
function '@vp8_loop_filter_all_edges_kernel'
Aborted (core dumped)

For that you get a:
Tested-By: Aaron Watry awa...@gmail.com



On Wed, Aug 21, 2013 at 1:33 PM, Tom Stellard t...@stellard.net wrote:
 From: Tom Stellard thomas.stell...@amd.com

 This pass was segfaulting when it ran into a non-intrinsic function
 call.  Function calls are not supported, so now instead of segfaulting,
 we will get an assertion failure with a nice error message.

 I'm not sure how to test this using lit.
 ---
  lib/Target/R600/R600TextureIntrinsicsReplacer.cpp | 3 +++
  1 file changed, 3 insertions(+)

 diff --git a/lib/Target/R600/R600TextureIntrinsicsReplacer.cpp 
 b/lib/Target/R600/R600TextureIntrinsicsReplacer.cpp
 index 37d9059..d4b8ec0 100644
 --- a/lib/Target/R600/R600TextureIntrinsicsReplacer.cpp
 +++ b/lib/Target/R600/R600TextureIntrinsicsReplacer.cpp
 @@ -260,6 +260,9 @@ public:
}

void visitCallInst(CallInst I) {
 +if (!I.getCalledFunction()) {
 +  return;
 +}
  StringRef Name = I.getCalledFunction()-getName();
  if (Name == llvm.AMDGPU.tex) {
ReplaceTexIntrinsic(I, false, TexSign, llvm.R600.tex, 
 llvm.R600.texc);
 --
 1.7.11.4

 ___
 llvm-commits mailing list
 llvm-comm...@cs.uiuc.edu
 http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] r600g/compute: Fix bug in compute memory pool

2013-08-28 Thread Aaron Watry
The changes look good to me... That seems to be a much more sane way
to add the item to the beginning of the linked list.

I've tested this on CEDAR (Radeon 5400) without any OpenCL
regressions, and the only piglit change was that the new piglit test
created for this bug now passes.

--Aaron


On Tue, Aug 27, 2013 at 10:17 AM, Tom Stellard t...@stellard.net wrote:
 From: Tom Stellard thomas.stell...@amd.com

 When adding a new buffer to the beginning of the memory pool, we were
 accidentally deleting the buffer that was first in the buffer list.
 This was caused by a bug in the memory pool's linked list
 implementation.
 ---
  src/gallium/drivers/r600/compute_memory_pool.c | 9 ++---
  1 file changed, 2 insertions(+), 7 deletions(-)

 diff --git a/src/gallium/drivers/r600/compute_memory_pool.c 
 b/src/gallium/drivers/r600/compute_memory_pool.c
 index 454af90..4846bfe 100644
 --- a/src/gallium/drivers/r600/compute_memory_pool.c
 +++ b/src/gallium/drivers/r600/compute_memory_pool.c
 @@ -337,14 +337,9 @@ void compute_memory_finalize_pending(struct 
 compute_memory_pool* pool,
 }
 } else {
 /* Add item to the front of the list */
 -   item-next = pool-item_list-next;
 -   if (pool-item_list-next) {
 -   pool-item_list-next-prev = item;
 -   }
 +   item-next = pool-item_list;
 item-prev = pool-item_list-prev;
 -   if (pool-item_list-prev) {
 -   pool-item_list-prev-next = item;
 -   }
 +   pool-item_list-prev = item;
 pool-item_list = item;
 }
 }
 --
 1.7.11.4

 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/4] r600g: use u_upload_mgr for allocating staging transfer buffers

2012-12-11 Thread Aaron Watry
I'm not familiar enough with the existing code to feel comfortable 
reviewing it, but I've run it through a full piglit test run (using 
tests/all.tests w/ GL/GLX enabled) without noticing any issues.


Also, Reaction Quake 3 performance went up by ~25% as a result of this 
series on my Radeon 6850.

http://openbenchmarking.org/result/1212102-SU-SUBALLOCT33

For my part:
Tested-by: Aaron Watry awa...@gmail.com

--Aaron Watry


u_upload_mgr suballocates memory from a large buffer and maps the allocated
range (unsychronized), which is perfect for short-lived staging buffers.

This reduces the number of relocations sent to the kernel.
---
  src/gallium/drivers/r600/r600_buffer.c |   30 +++---
  1 file changed, 15 insertions(+), 15 deletions(-)

diff --git a/src/gallium/drivers/r600/r600_buffer.c 
b/src/gallium/drivers/r600/r600_buffer.c
index 9e2cf66..e674e13 100644
--- a/src/gallium/drivers/r600/r600_buffer.c
+++ b/src/gallium/drivers/r600/r600_buffer.c
@@ -66,7 +66,8 @@ static void *r600_buffer_get_transfer(struct pipe_context 
*ctx,
unsigned usage,
const struct pipe_box *box,
  struct pipe_transfer **ptransfer,
- void *data, struct r600_resource *staging)
+ void *data, struct r600_resource *staging,
+ unsigned offset)
  {
struct r600_context *rctx = (struct r600_context*)ctx;
struct r600_transfer *transfer = util_slab_alloc(rctx-pool_transfers);
@@ -77,8 +78,7 @@ static void *r600_buffer_get_transfer(struct pipe_context 
*ctx,
transfer-transfer.box = *box;
transfer-transfer.stride = 0;
transfer-transfer.layer_stride = 0;
-   transfer-staging = NULL;
-   transfer-offset = 0;
+   transfer-offset = offset;
transfer-staging = staging;
*ptransfer =transfer-transfer;
return data;
@@ -147,18 +147,17 @@ static void *r600_buffer_transfer_map(struct pipe_context 
*ctx,
if (rctx-ws-cs_is_buffer_referenced(rctx-cs, 
rbuffer-cs_buf, RADEON_USAGE_READWRITE) ||
rctx-ws-buffer_is_busy(rbuffer-buf, 
RADEON_USAGE_READWRITE)) {
/* Do a wait-free write-only transfer using a temporary 
buffer. */
-   struct r600_resource *staging = (struct r600_resource*)
-   pipe_buffer_create(ctx-screen, 
PIPE_BIND_VERTEX_BUFFER,
-  PIPE_USAGE_STAGING,
-  box-width + (box-x % 
R600_MAP_BUFFER_ALIGNMENT));
-   data = rctx-ws-buffer_map(staging-cs_buf, rctx-cs, 
PIPE_TRANSFER_WRITE);
+   unsigned offset;
+   struct r600_resource *staging = NULL;

-   if (!data)
-   return NULL;
+   u_upload_alloc(rctx-uploader, 0, box-width + (box-x 
% R600_MAP_BUFFER_ALIGNMENT),
+   offset, (struct pipe_resource**)staging, 
(void**)data);

-   data += box-x % R600_MAP_BUFFER_ALIGNMENT;
-   return r600_buffer_get_transfer(ctx, resource, level, 
usage, box,
-   ptransfer, data, 
staging);
+   if (staging) {
+   data += box-x % R600_MAP_BUFFER_ALIGNMENT;
+   return r600_buffer_get_transfer(ctx, resource, 
level, usage, box,
+   ptransfer, 
data, staging, offset);
+   }
}
}

@@ -169,7 +168,7 @@ static void *r600_buffer_transfer_map(struct pipe_context 
*ctx,
data += box-x;

return r600_buffer_get_transfer(ctx, resource, level, usage, box,
-   ptransfer, data, NULL);
+   ptransfer, data, NULL, 0);
  }

  static void r600_buffer_transfer_unmap(struct pipe_context *pipe,
@@ -180,7 +179,8 @@ static void r600_buffer_transfer_unmap(struct pipe_context 
*pipe,

if (rtransfer-staging) {
struct pipe_box box;
-   u_box_1d(transfer-box.x % R600_MAP_BUFFER_ALIGNMENT, 
transfer-box.width,box);
+   u_box_1d(rtransfer-offset + transfer-box.x % 
R600_MAP_BUFFER_ALIGNMENT,
+transfer-box.width,box);

/* Copy the staging buffer into the original one. */
r600_copy_buffer(pipe, transfer-resource, transfer-box.x,
--
1.7.10.4
   

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] r600g: tgsi to llvm emits stream output intrinsics.

2012-12-14 Thread Aaron Watry

diff --git a/src/gallium/drivers/r600/r600_llvm.c 
b/src/gallium/drivers/r600/r600_llvm.c
index 8f1ed26..14c0205 100644
--- a/src/gallium/drivers/r600/r600_llvm.c
+++ b/src/gallium/drivers/r600/r600_llvm.c
@@ -229,11 +229,32 @@ static void llvm_emit_epilogue(struct 
lp_build_tgsi_context * bld_base)
  {
struct radeon_llvm_context * ctx = radeon_llvm_context(bld_base);
struct lp_build_context * base = bld_base-base;
+   struct pipe_stream_output_info * so = ctx-stream_outputs;
unsigned i;

unsigned color_count = 0;
boolean has_color = false;
  
+	if (ctx-type == TGSI_PROCESSOR_VERTEX  so-num_outputs) {

+   printf(I have %d so\n, so-num_outputs);


Did you mean to leave that printf there?


+   for (i = 0; i  so-num_outputs; i++) {
+   unsigned register_index = so-output[i].register_index;
+   unsigned start_component = 
so-output[i].start_component;
+   unsigned num_component = so-output[i].num_components;
+   unsigned dst_offset = so-output[i].dst_offset;
+   unsigned chan;
+   for (chan = start_component; chan  start_component + 
num_component; chan++) {
+   LLVMValueRef args[3];
+   args[0] = LLVMBuildLoad(base-gallivm-builder,
+   ctx-soa.outputs[register_index][chan], 
);
+   args[1] = lp_build_const_int32(base-gallivm, 4 
* (dst_offset - start_component) + chan);
+   args[2] = lp_build_const_int32(base-gallivm, 
so-output[i].output_buffer);
+   lp_build_intrinsic(base-gallivm-builder, 
llvm.R600.store.stream.output,
+   
LLVMVoidTypeInContext(base-gallivm-context), args, 3);
+   }
+   }
+   }
+
/* Add the necessary export instructions */
for (i = 0; i  ctx-output_reg_count; i++) {
unsigned chan;


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/2] drivers/radeon: Don't link against libgallium.la

2013-01-11 Thread Aaron Watry

Hi Tom,

This patch doesn't apply cleanly against mesa/master.

For some reason, a bunch of @ symbols got translated to  at .  Once I change 
that and apply the patch, it fixes the build for me.

my config:
./autogen.sh --enable-debug --with-dri-drivers=i965,radeon 
--with-gallium-drivers=r600 --enable-texture-float --enable-opencl 
--enable-r600-llvm-compiler --with-egl-platforms=x11,drm --enable-glx-tls

--Aaron Watry


From: Tom Stellardthomas.stellard at amd.com  
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

This fixes multiple symbol errors in pipe-loader
---
 src/gallium/drivers/radeon/Makefile.am | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/src/gallium/drivers/radeon/Makefile.am 
b/src/gallium/drivers/radeon/Makefile.am
index 091adc4..e6eb241 100644
--- a/src/gallium/drivers/radeon/Makefile.am
+++ b/src/gallium/drivers/radeon/Makefile.am
@@ -4,7 +4,7 @@ include $(top_srcdir)/src/gallium/Automake.inc
 if HAVE_GALLIUM_R600
 if HAVE_GALLIUM_RADEONSI
 lib_LTLIBRARIES =libllvmradeon at VERSION  
http://lists.freedesktop.org/mailman/listinfo/mesa-dev@.la
-libllvmradeon at VERSION  
http://lists.freedesktop.org/mailman/listinfo/mesa-dev@_la_LDFLAGS = 
-Wl,--no-undefined -shared -avoid-version \
+libllvmradeon at VERSION  
http://lists.freedesktop.org/mailman/listinfo/mesa-dev@_la_LDFLAGS = -Wl, 
-shared -avoid-version \
$(LLVM_LDFLAGS)
 else
 noinst_LTLIBRARIES =libllvmradeon at VERSION  
http://lists.freedesktop.org/mailman/listinfo/mesa-dev@.la
@@ -26,6 +26,5 @@libllvmradeon at VERSION  
http://lists.freedesktop.org/mailman/listinfo/mesa-dev@_la_SOURCES = \
$(C_FILES)

 libllvmradeon at VERSION  
http://lists.freedesktop.org/mailman/listinfo/mesa-dev@_la_LIBADD = \
-   $(top_builddir)/src/gallium/auxiliary/libgallium.la \
$(CLOCK_LIB) \
$(LLVM_LIBS)
--
1.7.11.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] Revert targets/opencl: Link against libgallium.la instead of libgallium.a

2013-01-14 Thread Aaron Watry


From: Tom Stellardthomas.stellard at amd.com  
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

This reverts commit 4148a29ed83d1d85bff3d4e40e847128011c3f20.

This fixes bug:https://bugs.freedesktop.org/show_bug.cgi?id=59334

We really should be linking against libgallium.la instead of
libgallium.a, but until we can figure why linking against libgallium.la
causes runtime failures in clover we will continue to link against
libgallium.a
   

Tested-by: Aaron Watry awa...@gmail.com

Piglit runs CL tests again, but I still get a bunch of run-time warnings 
along the lines of:


premain: CommandLine Error: Argument 'info-output-file' defined more 
than once!


--Aaron Watry

---
  src/gallium/auxiliary/Makefile.am  | 6 ++
  src/gallium/targets/opencl/Makefile.am | 4 +++-
  2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/src/gallium/auxiliary/Makefile.am 
b/src/gallium/auxiliary/Makefile.am
index a4eee47..4979293 100644
--- a/src/gallium/auxiliary/Makefile.am
+++ b/src/gallium/auxiliary/Makefile.am
@@ -45,3 +45,9 @@ util/u_format_srgb.c: $(srcdir)/util/u_format_srgb.py

  util/u_format_table.c: $(srcdir)/util/u_format_table.py 
$(srcdir)/util/u_format_pack.py $(srcdir)/util/u_format_parse.py 
$(srcdir)/util/u_format.csv
$(AM_V_GEN) $(PYTHON2) $(srcdir)/util/u_format_table.py 
$(srcdir)/util/u_format.csv  $@
+
+# XXX: As a work around forhttps://bugs.freedesktop.org/show_bug.cgi?id=59334
+# clover needs to link against libgallium.a. Delete this once we have a real
+# fix for this bug.
+all-local: libgallium.la
+   ln -f $(builddir)/.libs/libgallium.a $(builddir)/libgallium.a
diff --git a/src/gallium/targets/opencl/Makefile.am 
b/src/gallium/targets/opencl/Makefile.am
index c5c3003..be8ec12 100644
--- a/src/gallium/targets/opencl/Makefile.am
+++ b/src/gallium/targets/opencl/Makefile.am
@@ -6,9 +6,11 @@ libOpenCL_la_LDFLAGS = \
$(LLVM_LDFLAGS) \
-version-number 1:0

+# We are linking against libgallium.a rather than libgallium.la to work around
+#https://bugs.freedesktop.org/show_bug.cgi?id=59334
  libOpenCL_la_LIBADD = \
$(top_builddir)/src/gallium/state_trackers/clover/libclover.la \
-   $(top_builddir)/src/gallium/auxiliary/libgallium.la \
+   $(top_builddir)/src/gallium/auxiliary/libgallium.a \
$(GALLIUM_PIPE_LOADER_LIBS) $(LIBUDEV_LIBS) \
-ldl \
-lclangCodeGen \
--
1.7.11.4

   
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] AMD RS780 please help

2012-09-25 Thread Aaron Watry
From a daily quental build on my 780G motherboard's IGP:

ubuntu@ubuntu:~/src$ glxinfo | grep OpenGL
OpenGL vendor string: X.Org
OpenGL renderer string: Gallium 0.4 on AMD RS780
OpenGL version string: 2.1 Mesa 9.0-devel
OpenGL shading language version string: 1.30
OpenGL extensions:

ubuntu@ubuntu:~/src$ glxinfo | grep 'GL_EXT_transform_feedback'
GL_ARB_transform_feedback_instanced, GL_EXT_transform_feedback,

ubuntu@ubuntu:~/src$ uname -a
Linux ubuntu 3.5.0-15-generic #23-Ubuntu SMP Mon Sep 24 20:37:06 UTC
2012 x86_64 x86_64 x86_64 GNU/Linux

ubuntu@ubuntu:~/src/piglit/bin$ ./ext_transform_feedback-position -auto
Testing BindBufferBase.
Testing readback.
Buffer[0]: 0.765430,  Expected: -0.687500
Buffer[1]: 0.765430,  Expected: -0.375000
Buffer[2]: 0.765430,  Expected: 0.00
Buffer[3]: 0.765430,  Expected: 1.00
Buffer[4]: 0.765430,  Expected: -0.687500
Buffer[5]: 0.765430,  Expected: 0.25
Buffer[6]: 0.765430,  Expected: 0.00
Buffer[7]: 0.765430,  Expected: 1.00
Buffer[8]: 0.765430,  Expected: -0.375000
Buffer[9]: 0.765430,  Expected: -0.375000
Buffer[10]: 0.765430,  Expected: 0.00
Buffer[11]: 0.765430,  Expected: 1.00
Buffer[12]: 0.765430,  Expected: -0.687500
Buffer[13]: 0.765430,  Expected: 0.25
Buffer[14]: 0.765430,  Expected: 0.00
Buffer[15]: 0.765430,  Expected: 1.00
Buffer[16]: 0.765430,  Expected: -0.375000
Buffer[17]: 0.765430,  Expected: 0.25
Buffer[18]: 0.765430,  Expected: 0.00
Buffer[19]: 0.765430,  Expected: 1.00
Buffer[20]: 0.765430,  Expected: -0.375000
Buffer[21]: 0.765430,  Expected: -0.375000
Buffer[22]: 0.765430,  Expected: 0.00
Buffer[23]: 0.765430,  Expected: 1.00
PIGLIT: {'result': 'fail' }


Regards,
Aaron Watry



2012/9/24 Marek Ol??k mar...@gmail.com:
 Hi,

 would somebody with an RS780 be so kind as to test and see if
 transform feedback (GL3.0 feature) works for him with r600g?

 If the GL_EXT_transform_feedback extension is not exposed, you need a
 newer version of kernel and/or Mesa.

 To test transform feedback, run this in the piglit/bin directory:

 ./ext_transform_feedback-position -auto

 You can get piglit here: http://cgit.freedesktop.org/piglit/

 I have a fix for broken transform feedback on RS880. I just need to
 know if RS780 is broken in the same way. There's a pretty high
 probability that if it's broken, the cause is the same as on RS880.

 Thanks.

 Marek
 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] GSoC : Video decoding state tracker for Gallium3d

2011-04-04 Thread Aaron Watry
Hi Emeric,

It doesn't affect your proposal too much, but I'd recommend changing
the order of your August tasks a bit. I would suggest trying to work
on the loop filter before the motion compensation.  A few of the 720p
and 1080p videos that I profiled during my thesis work suggested that
the loop filter was responsible for ~50-60% of the decoding time using
the C code paths (using cachegrind). I haven't profiled the assembly
code recently, so you might want to do that before taking my word as
truth.

If you check out the WebM Codec Developer mailing list archive, you
might be able to find a little of the discussion that we had on the
loop filter algorithm and how to make it more appropriate to a GPU.
The loop filter is much less branchy than the motion compensation as
well, so you might have an easier time with the loop filter.

If you need to look at my OpenCL code, you've already got the github
location. I also plan to push this work upstream in the future. I've
created OpenCL kernels for motion compensation, IDCT/Dequant, and loop
filtering. I haven't started intra-prediction, and most of the kernels
are not launching many threads (common case is under a hundred... I
spent too much time in my project adding the OpenCL infrastructure and
wasn't able to optimize the algorithms as well as I had hoped).

Also, the vpxdec utility that is included with the libvpx decoder has
a '--md5' command line option that helped out a lot for checking for
correctness of a decoder implementation. The WebM project has a git
repository that includes a conformance testing script which you can
also use.  If you are continually using this conformance testing suite
as part of your workflow, item K (debugging) will become a lot easier.

If you have any questions, feel free to email me.  I have a month of
class to finish up before graduation, but I plan on continuing to work
on this project in the meantime and after graduation).
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] RFC clover: calculate maximum workgroup size based on device

2013-10-23 Thread Aaron Watry
The maximum workgroup size for a given kernel is based on the
capabilities of the device that it's being run on. Previously,
we were just returning the maximum value of a size_t which is
obviously wrong.

This patch uses the device's capabilities, but doesn't take into
account any resource usage which would decrease the work group
size further.  Suggestions/comments/fixes welcome.
---
 src/gallium/state_trackers/clover/api/kernel.cpp | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/src/gallium/state_trackers/clover/api/kernel.cpp 
b/src/gallium/state_trackers/clover/api/kernel.cpp
index d6129e6..90bb213 100644
--- a/src/gallium/state_trackers/clover/api/kernel.cpp
+++ b/src/gallium/state_trackers/clover/api/kernel.cpp
@@ -156,7 +156,11 @@ clGetKernelWorkGroupInfo(cl_kernel d_kern, cl_device_id 
d_dev,
 
switch (param) {
case CL_KERNEL_WORK_GROUP_SIZE:
-  buf.as_scalarsize_t() = kern.max_block_size();
+  //FIXME: This should be maximum that the requested device can support for
+  //   this kernel, not the maximum value of a size_t... and just using
+  //   dev-max_threads_per_block doesn't take into account the 
kernel's
+  //   resource usage...
+  buf.as_scalarsize_t() = pdev-max_threads_per_block();
   break;
 
case CL_KERNEL_COMPILE_WORK_GROUP_SIZE:
-- 
1.8.3.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] R600: Expand vector FSQRT ops

2013-10-25 Thread Aaron Watry
Reviewed-by: Aaron Watry awa...@gmail.com

I have tested this on a Radeon 5400 (Cedar), and I just sent a few
generated tests to the piglit list.

--Aaron

On Wed, Oct 23, 2013 at 6:28 PM, Tom Stellard t...@stellard.net wrote:
 From: Tom Stellard thomas.stell...@amd.com

 ---
  lib/Target/R600/AMDGPUISelLowering.cpp |  1 +
  test/CodeGen/R600/llvm.sqrt.ll | 54 
 ++
  2 files changed, 55 insertions(+)
  create mode 100644 test/CodeGen/R600/llvm.sqrt.ll

 diff --git a/lib/Target/R600/AMDGPUISelLowering.cpp 
 b/lib/Target/R600/AMDGPUISelLowering.cpp
 index 91d85d3..52dd010 100644
 --- a/lib/Target/R600/AMDGPUISelLowering.cpp
 +++ b/lib/Target/R600/AMDGPUISelLowering.cpp
 @@ -181,6 +181,7 @@ AMDGPUTargetLowering::AMDGPUTargetLowering(TargetMachine 
 TM) :
  setOperationAction(ISD::FFLOOR, VT, Expand);
  setOperationAction(ISD::FMUL, VT, Expand);
  setOperationAction(ISD::FRINT, VT, Expand);
 +setOperationAction(ISD::FSQRT, VT, Expand);
  setOperationAction(ISD::FSUB, VT, Expand);
}
  }
 diff --git a/test/CodeGen/R600/llvm.sqrt.ll b/test/CodeGen/R600/llvm.sqrt.ll
 new file mode 100644
 index 000..0d0d186
 --- /dev/null
 +++ b/test/CodeGen/R600/llvm.sqrt.ll
 @@ -0,0 +1,54 @@
 +; RUN: llc  %s -march=r600 --mcpu=redwood | FileCheck %s 
 --check-prefix=R600-CHECK
 +; RUN: llc  %s -march=r600 --mcpu=SI | FileCheck %s --check-prefix=SI-CHECK
 +
 +; R600-CHECK-LABEL: @sqrt_f32
 +; R600-CHECK: RECIPSQRT_CLAMPED * T{{[0-9]\.[XYZW]}}, KC0[2].Z
 +; R600-CHECK: MUL NON-IEEE T{{[0-9]\.[XYZW]}}, KC0[2].Z, PS
 +; SI-CHECK-LABEL: @sqrt_f32
 +; SI-CHECK: V_SQRT_F32_e32
 +define void @sqrt_f32(float addrspace(1)* %out, float %in) {
 +entry:
 +  %0 = call float @llvm.sqrt.f32(float %in)
 +  store float %0, float addrspace(1)* %out
 +  ret void
 +}
 +
 +; R600-CHECK-LABEL: @sqrt_v2f32
 +; R600-CHECK-DAG: RECIPSQRT_CLAMPED * T{{[0-9]\.[XYZW]}}, KC0[2].W
 +; R600-CHECK-DAG: MUL NON-IEEE T{{[0-9]\.[XYZW]}}, KC0[2].W, PS
 +; R600-CHECK-DAG: RECIPSQRT_CLAMPED * T{{[0-9]\.[XYZW]}}, KC0[3].X
 +; R600-CHECK-DAG: MUL NON-IEEE T{{[0-9]\.[XYZW]}}, KC0[3].X, PS
 +; SI-CHECK-LABEL: @sqrt_v2f32
 +; SI-CHECK: V_SQRT_F32_e32
 +; SI-CHECK: V_SQRT_F32_e32
 +define void @sqrt_v2f32(2 x float addrspace(1)* %out, 2 x float %in) {
 +entry:
 +  %0 = call 2 x float @llvm.sqrt.v2f32(2 x float %in)
 +  store 2 x float %0, 2 x float addrspace(1)* %out
 +  ret void
 +}
 +
 +; R600-CHECK-LABEL: @sqrt_v4f32
 +; R600-CHECK-DAG: RECIPSQRT_CLAMPED * T{{[0-9]\.[XYZW]}}, KC0[3].Y
 +; R600-CHECK-DAG: MUL NON-IEEE T{{[0-9]\.[XYZW]}}, KC0[3].Y, PS
 +; R600-CHECK-DAG: RECIPSQRT_CLAMPED * T{{[0-9]\.[XYZW]}}, KC0[3].Z
 +; R600-CHECK-DAG: MUL NON-IEEE T{{[0-9]\.[XYZW]}}, KC0[3].Z, PS
 +; R600-CHECK-DAG: RECIPSQRT_CLAMPED * T{{[0-9]\.[XYZW]}}, KC0[3].W
 +; R600-CHECK-DAG: MUL NON-IEEE T{{[0-9]\.[XYZW]}}, KC0[3].W, PS
 +; R600-CHECK-DAG: RECIPSQRT_CLAMPED * T{{[0-9]\.[XYZW]}}, KC0[4].X
 +; R600-CHECK-DAG: MUL NON-IEEE T{{[0-9]\.[XYZW]}}, KC0[4].X, PS
 +; SI-CHECK-LABEL: @sqrt_v4f32
 +; SI-CHECK: V_SQRT_F32_e32
 +; SI-CHECK: V_SQRT_F32_e32
 +; SI-CHECK: V_SQRT_F32_e32
 +; SI-CHECK: V_SQRT_F32_e32
 +define void @sqrt_v4f32(4 x float addrspace(1)* %out, 4 x float %in) {
 +entry:
 +  %0 = call 4 x float @llvm.sqrt.v4f32(4 x float %in)
 +  store 4 x float %0, 4 x float addrspace(1)* %out
 +  ret void
 +}
 +
 +declare float @llvm.sqrt.f32(float %in)
 +declare 2 x float @llvm.sqrt.v2f32(2 x float %in)
 +declare 4 x float @llvm.sqrt.v4f32(4 x float %in)
 --
 1.7.11.4

 ___
 llvm-commits mailing list
 llvm-comm...@cs.uiuc.edu
 http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] radeon/llvm: Specify the DataLayout when running optimizations

2013-10-28 Thread Aaron Watry
I ran this through a piglit CL test run on my 7850, no test fixes or
regressions.

--Aaron

On Tue, Oct 22, 2013 at 11:28 AM, Tom Stellard t...@stellard.net wrote:
 From: Tom Stellard thomas.stell...@amd.com

 Without DataLayout, a lot of optimization passes aren't run and the ones
 that are don't work as well.
 ---
  src/gallium/drivers/radeon/radeon_llvm_util.c | 4 
  1 file changed, 4 insertions(+)

 diff --git a/src/gallium/drivers/radeon/radeon_llvm_util.c 
 b/src/gallium/drivers/radeon/radeon_llvm_util.c
 index 25be245..7192dee 100644
 --- a/src/gallium/drivers/radeon/radeon_llvm_util.c
 +++ b/src/gallium/drivers/radeon/radeon_llvm_util.c
 @@ -29,6 +29,7 @@

  #include llvm-c/BitReader.h
  #include llvm-c/Core.h
 +#include llvm-c/Target.h
  #include llvm-c/Transforms/PassManagerBuilder.h

  LLVMModuleRef radeon_llvm_parse_bitcode(const unsigned char * bitcode,
 @@ -53,8 +54,11 @@ unsigned radeon_llvm_get_num_kernels(const unsigned char 
 *bitcode,

  static void radeon_llvm_optimize(LLVMModuleRef mod)
  {
 +   const char *data_layout = LLVMGetDataLayout(mod);
 +   LLVMTargetDataRef TD = LLVMCreateTargetData(data_layout);
 LLVMPassManagerBuilderRef builder = LLVMPassManagerBuilderCreate();
 LLVMPassManagerRef pass_manager = LLVMCreatePassManager();
 +   LLVMAddTargetData(TD, pass_manager);

 LLVMPassManagerBuilderUseInlinerWithThreshold(builder, 10);
 LLVMPassManagerBuilderPopulateModulePassManager(builder, 
 pass_manager);
 --
 1.7.11.4

 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] radeon/llvm: Specify the DataLayout when running optimizations

2013-10-28 Thread Aaron Watry
I just ran a quick.tests run on evergreen without any regressions.

Patch looks good to me, and doesn't seem to cause any regressions on
the hardware I have available to test with.

--Aaron

On Tue, Oct 22, 2013 at 11:28 AM, Tom Stellard t...@stellard.net wrote:
 From: Tom Stellard thomas.stell...@amd.com

 Without DataLayout, a lot of optimization passes aren't run and the ones
 that are don't work as well.
 ---
  src/gallium/drivers/radeon/radeon_llvm_util.c | 4 
  1 file changed, 4 insertions(+)

 diff --git a/src/gallium/drivers/radeon/radeon_llvm_util.c 
 b/src/gallium/drivers/radeon/radeon_llvm_util.c
 index 25be245..7192dee 100644
 --- a/src/gallium/drivers/radeon/radeon_llvm_util.c
 +++ b/src/gallium/drivers/radeon/radeon_llvm_util.c
 @@ -29,6 +29,7 @@

  #include llvm-c/BitReader.h
  #include llvm-c/Core.h
 +#include llvm-c/Target.h
  #include llvm-c/Transforms/PassManagerBuilder.h

  LLVMModuleRef radeon_llvm_parse_bitcode(const unsigned char * bitcode,
 @@ -53,8 +54,11 @@ unsigned radeon_llvm_get_num_kernels(const unsigned char 
 *bitcode,

  static void radeon_llvm_optimize(LLVMModuleRef mod)
  {
 +   const char *data_layout = LLVMGetDataLayout(mod);
 +   LLVMTargetDataRef TD = LLVMCreateTargetData(data_layout);
 LLVMPassManagerBuilderRef builder = LLVMPassManagerBuilderCreate();
 LLVMPassManagerRef pass_manager = LLVMCreatePassManager();
 +   LLVMAddTargetData(TD, pass_manager);

 LLVMPassManagerBuilderUseInlinerWithThreshold(builder, 10);
 LLVMPassManagerBuilderPopulateModulePassManager(builder, 
 pass_manager);
 --
 1.7.11.4

 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] clover: Calculate the optimal work group size when local_size is NULL

2013-10-29 Thread Aaron Watry
On Tue, Oct 29, 2013 at 7:06 PM, Niels Ole Salscheider
niels_...@salscheider-online.de wrote:
 Hi Tom,

 this has been on my todo list for quite a while.

 Your patch looks good to me, but in my experience a block with approximately
 the same size for each dimension gives slightly better performance in many
 cases when compared to one where one dimension is significantly larger.
 Maybe you could initialise the size for each dimension to 1 and multiply them
 by 2 in a round-robin fashion as long as feasible.

 Regards,

 Ole

Either that, or use a greatest common factor algorithm to determine
the GCF between the maximum workgroup size and the global work size...
 The main thing that stuck out when I was looking at this before was
that in the case that you had a global size that wasn't a power of
two, we might end up with local work group sizes that are smaller than
necessary.

Feel free to borrow from the following if needed or if it's at all
useful (euclid's method implementation):
https://github.com/awatry/libvpx.opencl/blob/master/vp8/common/opencl/vp8_opencl.h#L89

That being said, what Tom's got is a definite improvement, and I
believe that it'll still be an improvement over what we have now.   I
haven't experimented much with the round-robin increasing of dimension
sizes given that the algorithm that I've done most of my GPU work in
has been limited in work group size flexibility.

Regardless of what we end up with, this patch looks good to me.  We
can improve upon it if needed, but it looks good.  Note that it will
have to probably be re-based on top of some of Francisco's recent
work.

--Aaron

 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] clover: Don't install headers when using the icd

2013-10-30 Thread Aaron Watry
Reviewed and Tested-by: Aaron Watry awa...@gmail.com

On Tue, Oct 29, 2013 at 11:48 AM, Tom Stellard t...@stellard.net wrote:
 From: Tom Stellard thomas.stell...@amd.com

 The ICD loader should be responsible for installing headers.
 ---
  src/gallium/state_trackers/clover/Makefile.am | 21 +++--
  1 file changed, 11 insertions(+), 10 deletions(-)

 diff --git a/src/gallium/state_trackers/clover/Makefile.am 
 b/src/gallium/state_trackers/clover/Makefile.am
 index 89e76a9..c79fd37 100644
 --- a/src/gallium/state_trackers/clover/Makefile.am
 +++ b/src/gallium/state_trackers/clover/Makefile.am
 @@ -14,6 +14,17 @@ AM_CPPFLAGS = \

  if HAVE_CLOVER_ICD
  AM_CPPFLAGS += -DHAVE_CLOVER_ICD
 +else
 +# Only install the headers if we are building a stand-alone implementation
 +cldir = $(includedir)/CL
 +cl_HEADERS = \
 +   $(top_srcdir)/include/CL/cl.h \
 +   $(top_srcdir)/include/CL/cl_ext.h \
 +   $(top_srcdir)/include/CL/cl_gl.h \
 +   $(top_srcdir)/include/CL/cl_gl_ext.h \
 +   $(top_srcdir)/include/CL/cl_platform.h \
 +   $(top_srcdir)/include/CL/opencl.h \
 +   $(top_srcdir)/include/CL/cl.hpp
  endif

  noinst_LTLIBRARIES = libclover.la libcltgsi.la libclllvm.la
 @@ -45,13 +56,3 @@ libclover_la_LIBADD = \
 libcltgsi.la libclllvm.la

  libclover_la_SOURCES = $(CPP_SOURCES)
 -
 -cldir = $(includedir)/CL
 -cl_HEADERS = \
 -   $(top_srcdir)/include/CL/cl.h \
 -   $(top_srcdir)/include/CL/cl_ext.h \
 -   $(top_srcdir)/include/CL/cl_gl.h \
 -   $(top_srcdir)/include/CL/cl_gl_ext.h \
 -   $(top_srcdir)/include/CL/cl_platform.h \
 -   $(top_srcdir)/include/CL/opencl.h \
 -   $(top_srcdir)/include/CL/cl.hpp
 --
 1.8.1.5

 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] clover: fix build with LLVM 3.4

2013-11-01 Thread Aaron Watry
dso_list was added as an argument for createInternalizePass in 3.4, and then
it was removed again in the same llvm version.
---
 src/gallium/state_trackers/clover/llvm/invocation.cpp | 5 -
 1 file changed, 5 deletions(-)

diff --git a/src/gallium/state_trackers/clover/llvm/invocation.cpp 
b/src/gallium/state_trackers/clover/llvm/invocation.cpp
index 4ae496f..3f50317 100644
--- a/src/gallium/state_trackers/clover/llvm/invocation.cpp
+++ b/src/gallium/state_trackers/clover/llvm/invocation.cpp
@@ -267,12 +267,7 @@ namespace {
  llvm::Function *kernel = *I;
  export_list.push_back(kernel-getName().data());
   }
-#if HAVE_LLVM  0x0304
   PM.add(llvm::createInternalizePass(export_list));
-#else
-  std::vectorconst char* dso_list;
-  PM.add(llvm::createInternalizePass(export_list, dso_list));
-#endif
   PM.run(*mod);
}
 
-- 
1.8.3.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/5] mesa: remove Alpha CPU checks

2013-11-05 Thread Aaron Watry
On Mon, Nov 4, 2013 at 7:04 PM, Matt Turner matts...@gmail.com wrote:
 On Mon, Nov 4, 2013 at 4:48 PM, Brian Paul bri...@vmware.com wrote:
 ---
  src/mesa/main/compiler.h |7 +--
  1 file changed, 1 insertion(+), 6 deletions(-)

 diff --git a/src/mesa/main/compiler.h b/src/mesa/main/compiler.h
 index 61ce5db..2752ca8 100644
 --- a/src/mesa/main/compiler.h
 +++ b/src/mesa/main/compiler.h
 @@ -36,11 +36,7 @@

  #include assert.h
  #include ctype.h
 -#if defined(__alpha__)  defined(CCPML)
 -#include cpml.h /* use Compaq's Fast Math Library on Alpha */
 -#else
  #include math.h
 -#endif
  #include limits.h
  #include stdlib.h
  #include stdio.h
 @@ -317,8 +313,7 @@ static INLINE GLuint CPU_TO_LE32(GLuint x)
  defined(__mips) || defined(_MIPS_ARCH) || \
  defined(__arm__) || \
  defined(__sh__) || defined(__m32r__) || \
 -(defined(__sun)  defined(_IEEE_754)) || \
 -defined(__alpha__)
 +(defined(__sun)  defined(_IEEE_754))
  #define USE_IEEE
  #define IEEE_ONE 0x3f80
  #endif
 --
 1.7.10.4

 I actually have an Alpha with an R300. I'd like this hunk to stay.
 CPML support... feel free to drop.

Agreed.  I've got a PWS 500a that I'm in the process of getting
running again, and I'd hate to lose support for it... but it'll be
running Gentoo, so feel free to drop the CPML bits.

--Aaron


 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev

On Mon, Nov 4, 2013 at 7:04 PM, Matt Turner matts...@gmail.com wrote:
 On Mon, Nov 4, 2013 at 4:48 PM, Brian Paul bri...@vmware.com wrote:
 ---
  src/mesa/main/compiler.h |7 +--
  1 file changed, 1 insertion(+), 6 deletions(-)

 diff --git a/src/mesa/main/compiler.h b/src/mesa/main/compiler.h
 index 61ce5db..2752ca8 100644
 --- a/src/mesa/main/compiler.h
 +++ b/src/mesa/main/compiler.h
 @@ -36,11 +36,7 @@

  #include assert.h
  #include ctype.h
 -#if defined(__alpha__)  defined(CCPML)
 -#include cpml.h /* use Compaq's Fast Math Library on Alpha */
 -#else
  #include math.h
 -#endif
  #include limits.h
  #include stdlib.h
  #include stdio.h
 @@ -317,8 +313,7 @@ static INLINE GLuint CPU_TO_LE32(GLuint x)
  defined(__mips) || defined(_MIPS_ARCH) || \
  defined(__arm__) || \
  defined(__sh__) || defined(__m32r__) || \
 -(defined(__sun)  defined(_IEEE_754)) || \
 -defined(__alpha__)
 +(defined(__sun)  defined(_IEEE_754))
  #define USE_IEEE
  #define IEEE_ONE 0x3f80
  #endif
 --
 1.7.10.4

 I actually have an Alpha with an R300. I'd like this hunk to stay.
 CPML support... feel free to drop.
 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] vl: use a separate context for shader based decode

2013-11-06 Thread Aaron Watry
On Wed, Nov 6, 2013 at 8:13 AM, Christian König deathsim...@vodafone.de wrote:
 From: Christian König christian.koe...@amd.com

 This makes VDPAU thread save again.

 Signed-off-by: Christian König christian.koe...@amd.com
 ---
  src/gallium/auxiliary/vl/vl_mpeg12_decoder.c | 180 
 ++-
  src/gallium/auxiliary/vl/vl_mpeg12_decoder.h |   1 +
  2 files changed, 120 insertions(+), 61 deletions(-)

 diff --git a/src/gallium/auxiliary/vl/vl_mpeg12_decoder.c 
 b/src/gallium/auxiliary/vl/vl_mpeg12_decoder.c
 index ca4eb3e..3777174 100644
 --- a/src/gallium/auxiliary/vl/vl_mpeg12_decoder.c
 +++ b/src/gallium/auxiliary/vl/vl_mpeg12_decoder.c
 @@ -82,6 +82,63 @@ static const unsigned const_empty_block_mask_420[3][2][2] 
 = {
 { { 0x01, 0x01 },  { 0x01, 0x01 } }
  };

 +struct video_buffer_private
 +{
 +   struct pipe_sampler_view *sampler_view_planes[VL_NUM_COMPONENTS];
 +   struct pipe_surface  *surfaces[VL_MAX_SURFACES];
 +
 +   struct vl_mpeg12_buffer *buffer;
 +};
 +
 +static void
 +vl_mpeg12_destroy_buffer(struct vl_mpeg12_buffer *buf);
 +
 +static void
 +destroy_video_buffer_private(void *private)
 +{
 +   struct video_buffer_private *priv = private;
 +   unsigned i;
 +
 +   for (i = 0; i  VL_NUM_COMPONENTS; ++i)
 +  pipe_sampler_view_reference(priv-sampler_view_planes[i], NULL);
 +
 +   for (i = 0; i  VL_MAX_SURFACES; ++i)
 +  pipe_surface_reference(priv-surfaces[i], NULL);
 +
 +   if (priv-buffer)
 +  vl_mpeg12_destroy_buffer(priv-buffer);

Should we be freeing priv/private at the end here, or is this struct
still in use elsewhere?
I'm assuming that it's unused, given that this is the matching
destructor for get_video_buffer_private
where the struct is CALLOC'd.

I'm not too qualified to give a full review of this patch, but I had
memory leaks on my mind after some stuff I was working on last night.
I didn't see any others besides the one in my first email and this
one. I haven't run this through valgrind, so for the moment it's just
me looking for stuff that's fishy.

--Aaron

 +}
 +
 +static struct video_buffer_private *
 +get_video_buffer_private(struct vl_mpeg12_decoder *dec, struct 
 pipe_video_buffer *buf)
 +{
 +   struct pipe_context *pipe = dec-context;
 +   struct video_buffer_private *priv;
 +   struct pipe_sampler_view **sv;
 +   struct pipe_surface **surf;
 +   unsigned i;
 +
 +   priv = vl_video_buffer_get_associated_data(buf, dec-base);
 +   if (priv)
 +  return priv;
 +
 +   priv = CALLOC_STRUCT(video_buffer_private);
 +
 +   sv = buf-get_sampler_view_planes(buf);
 +   for (i = 0; i  VL_NUM_COMPONENTS; ++i)
 +  if (sv[i])
 + priv-sampler_view_planes[i] = pipe-create_sampler_view(pipe, 
 sv[i]-texture, sv[i]);
 +
 +   surf = buf-get_surfaces(buf);
 +   for (i = 0; i  VL_MAX_SURFACES; ++i)
 +  if (surf[i])
 + priv-surfaces[i] = pipe-create_surface(pipe, surf[i]-texture, 
 surf[i]);
 +
 +   vl_video_buffer_set_associated_data(buf, dec-base, priv, 
 destroy_video_buffer_private);
 +
 +   return priv;
 +}
 +
  static bool
  init_zscan_buffer(struct vl_mpeg12_decoder *dec, struct vl_mpeg12_buffer 
 *buffer)
  {
 @@ -103,7 +160,7 @@ init_zscan_buffer(struct vl_mpeg12_decoder *dec, struct 
 vl_mpeg12_buffer *buffer
 res_tmpl.usage = PIPE_USAGE_STREAM;
 res_tmpl.bind = PIPE_BIND_SAMPLER_VIEW;

 -   res = 
 dec-base.context-screen-resource_create(dec-base.context-screen, 
 res_tmpl);
 +   res = dec-context-screen-resource_create(dec-context-screen, 
 res_tmpl);
 if (!res)
goto error_source;

 @@ -111,7 +168,7 @@ init_zscan_buffer(struct vl_mpeg12_decoder *dec, struct 
 vl_mpeg12_buffer *buffer
 memset(sv_tmpl, 0, sizeof(sv_tmpl));
 u_sampler_view_default_template(sv_tmpl, res, res-format);
 sv_tmpl.swizzle_r = sv_tmpl.swizzle_g = sv_tmpl.swizzle_b = 
 sv_tmpl.swizzle_a = PIPE_SWIZZLE_RED;
 -   buffer-zscan_source = 
 dec-base.context-create_sampler_view(dec-base.context, res, sv_tmpl);
 +   buffer-zscan_source = dec-context-create_sampler_view(dec-context, 
 res, sv_tmpl);
 pipe_resource_reference(res, NULL);
 if (!buffer-zscan_source)
goto error_sampler;
 @@ -384,9 +441,8 @@ UploadYcbcrBlocks(struct vl_mpeg12_decoder *dec,
  }

  static void
 -vl_mpeg12_destroy_buffer(void *buffer)
 +vl_mpeg12_destroy_buffer(struct vl_mpeg12_buffer *buf)
  {
 -   struct vl_mpeg12_buffer *buf = buffer;

 assert(buf);

 @@ -407,11 +463,11 @@ vl_mpeg12_destroy(struct pipe_video_codec *decoder)
 assert(decoder);

 /* Asserted in softpipe_delete_fs_state() for some reason */
 -   dec-base.context-bind_vs_state(dec-base.context, NULL);
 -   dec-base.context-bind_fs_state(dec-base.context, NULL);
 +   dec-context-bind_vs_state(dec-context, NULL);
 +   dec-context-bind_fs_state(dec-context, NULL);

 -   dec-base.context-delete_depth_stencil_alpha_state(dec-base.context, 
 dec-dsa);
 -   dec-base.context-delete_sampler_state(dec-base.context, 
 dec-sampler_ycbcr);
 +   

[Mesa-dev] [PATCH 6/6] gallium/pipe_loader: un-reference udev resources when we're done with them.

2013-11-06 Thread Aaron Watry
---
 src/gallium/auxiliary/pipe-loader/pipe_loader_drm.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/src/gallium/auxiliary/pipe-loader/pipe_loader_drm.c 
b/src/gallium/auxiliary/pipe-loader/pipe_loader_drm.c
index 339d7bf..927fb24 100644
--- a/src/gallium/auxiliary/pipe-loader/pipe_loader_drm.c
+++ b/src/gallium/auxiliary/pipe-loader/pipe_loader_drm.c
@@ -88,6 +88,9 @@ find_drm_pci_id(struct pipe_loader_drm_device *ddev)
   ddev-base.u.pci.chip_id) != 2)
   goto fail;
 
+   udev_device_unref(device);
+   udev_unref(udev);
+
return TRUE;
 
   fail:
-- 
1.8.3.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 4/6] radeonsi/compute: Free program and program.kernels on shutdown

2013-11-06 Thread Aaron Watry
---
 src/gallium/drivers/radeonsi/radeonsi_compute.c | 16 +++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/src/gallium/drivers/radeonsi/radeonsi_compute.c 
b/src/gallium/drivers/radeonsi/radeonsi_compute.c
index 265dbd7..28a3f17 100644
--- a/src/gallium/drivers/radeonsi/radeonsi_compute.c
+++ b/src/gallium/drivers/radeonsi/radeonsi_compute.c
@@ -236,7 +236,21 @@ static void radeonsi_launch_grid(
 }
 
 
-static void si_delete_compute_state(struct pipe_context *ctx, void* state){}
+static void si_delete_compute_state(struct pipe_context *ctx, void* state){
+struct si_pipe_compute *program = (struct si_pipe_compute *)state;
+
+if (!state) {
+return;
+}
+
+if (program-kernels) {
+FREE(program-kernels);
+}
+
+//And then free the program itself.
+FREE(program);
+}
+
 static void si_set_compute_resources(struct pipe_context * ctx_,
unsigned start, unsigned count,
struct pipe_surface ** surfaces) { }
-- 
1.8.3.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 0/6] radeon: Plug some memory leaks

2013-11-06 Thread Aaron Watry
On Wed, Nov 6, 2013 at 12:15 PM, Tom Stellard t...@stellard.net wrote:
 These look good, but the indentation seems wrong in patches 2 through 6.

An artifact of expanding tabs to 4 spaces in the IDE...  Although I'd
argue that patch 6 is correct given that the portion of the file
affected already uses tabs for spaces.

Do you want a v2, or are you happy with the patches assuming that I
fix the indentation?

--Aaron


 -Tom

 On Wed, Nov 06, 2013 at 10:36:49AM -0600, Aaron Watry wrote:
 I decided to have some fun and hooked valgrind up to my 7850 while running
 a few OpenCL tests in piglit. This is the first batch of fixes.

 Aaron Watry (6):
   radeon/llvm: fix spelling error
   radeon/llvm: Free libelf resources
   radeon/llvm: Free created llvm memory buffer
   radeonsi/compute: Free program and program.kernels on shutdown
   radeonsi/compute: Dispose of LLVM module after compiling kernels
   gallium/pipe_loader: un-reference udev resources when we're done with
 them.

  src/gallium/auxiliary/pipe-loader/pipe_loader_drm.c |  3 +++
  src/gallium/drivers/radeon/radeon_llvm_emit.c   |  3 +++
  src/gallium/drivers/radeon/radeon_llvm_util.c   |  1 +
  src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c |  2 +-
  src/gallium/drivers/radeonsi/radeonsi_compute.c | 17 -
  5 files changed, 24 insertions(+), 2 deletions(-)


 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 0/6 v2] radeon: Plug some memory leaks

2013-11-06 Thread Aaron Watry
Turns out that I don't have commit access to Mesa, just piglit.  Feel
free to push if they look good.

I decided to have some fun and hooked valgrind up to my SI while running
a few OpenCL tests in piglit. This is the first batch of fixes.

Aaron Watry (6):
  radeon/llvm: fix spelling error
  radeon/llvm: Free libelf resources
  radeon/llvm: Free created llvm memory buffer
  radeonsi/compute: Free program and program.kernels on shutdown
  radeonsi/compute: Dispose of LLVM module after compiling kernels
  gallium/pipe_loader: un-reference udev resources when we're done with
them.

v2: Fix indentation in patches 2, 3, 4, and 5.

 src/gallium/auxiliary/pipe-loader/pipe_loader_drm.c |  3 +++
 src/gallium/drivers/radeon/radeon_llvm_emit.c   |  3 +++
 src/gallium/drivers/radeon/radeon_llvm_util.c   |  1 +
 src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c |  2 +-
 src/gallium/drivers/radeonsi/radeonsi_compute.c | 17 -
 5 files changed, 24 insertions(+), 2 deletions(-)

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 4/6] radeonsi/compute: Free program and program.kernels on shutdown

2013-11-06 Thread Aaron Watry
v2: Fix indentation

Reviewed-by: Tom Stellard thomas.stell...@amd.com
---
 src/gallium/drivers/radeonsi/radeonsi_compute.c | 16 +++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/src/gallium/drivers/radeonsi/radeonsi_compute.c 
b/src/gallium/drivers/radeonsi/radeonsi_compute.c
index 265dbd7..32d2487 100644
--- a/src/gallium/drivers/radeonsi/radeonsi_compute.c
+++ b/src/gallium/drivers/radeonsi/radeonsi_compute.c
@@ -236,7 +236,21 @@ static void radeonsi_launch_grid(
 }
 
 
-static void si_delete_compute_state(struct pipe_context *ctx, void* state){}
+static void si_delete_compute_state(struct pipe_context *ctx, void* state){
+   struct si_pipe_compute *program = (struct si_pipe_compute *)state;
+
+   if (!state) {
+   return;
+   }
+
+   if (program-kernels) {
+   FREE(program-kernels);
+   }
+
+   //And then free the program itself.
+   FREE(program);
+}
+
 static void si_set_compute_resources(struct pipe_context * ctx_,
unsigned start, unsigned count,
struct pipe_surface ** surfaces) { }
-- 
1.8.3.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 6/6] gallium/pipe_loader: un-reference udev resources when we're done with them.

2013-11-06 Thread Aaron Watry
Reviewed-by: Tom Stellard thomas.stell...@amd.com
---
 src/gallium/auxiliary/pipe-loader/pipe_loader_drm.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/src/gallium/auxiliary/pipe-loader/pipe_loader_drm.c 
b/src/gallium/auxiliary/pipe-loader/pipe_loader_drm.c
index 339d7bf..927fb24 100644
--- a/src/gallium/auxiliary/pipe-loader/pipe_loader_drm.c
+++ b/src/gallium/auxiliary/pipe-loader/pipe_loader_drm.c
@@ -88,6 +88,9 @@ find_drm_pci_id(struct pipe_loader_drm_device *ddev)
   ddev-base.u.pci.chip_id) != 2)
   goto fail;
 
+   udev_device_unref(device);
+   udev_unref(udev);
+
return TRUE;
 
   fail:
-- 
1.8.3.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/6] radeon/llvm: Free libelf resources

2013-11-06 Thread Aaron Watry
v2: Fix indentation

Reviewed-by: Tom Stellard thomas.stell...@amd.com
---
 src/gallium/drivers/radeon/radeon_llvm_emit.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/src/gallium/drivers/radeon/radeon_llvm_emit.c 
b/src/gallium/drivers/radeon/radeon_llvm_emit.c
index 8bf278b..d2e5642 100644
--- a/src/gallium/drivers/radeon/radeon_llvm_emit.c
+++ b/src/gallium/drivers/radeon/radeon_llvm_emit.c
@@ -173,6 +173,9 @@ unsigned radeon_llvm_compile(LLVMModuleRef M, struct 
radeon_llvm_binary *binary,
}
}
 
+   if (elf){
+   elf_end(elf);
+   }
LLVMDisposeMemoryBuffer(out_buffer);
LLVMDisposeTargetMachine(tm);
return 0;
-- 
1.8.3.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/6] radeon/llvm: fix spelling error

2013-11-06 Thread Aaron Watry
Reviewed-by: Tom Stellard thomas.stell...@amd.com
---
 src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c 
b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
index 286ccdd..57026bf 100644
--- a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
+++ b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
@@ -1379,7 +1379,7 @@ void radeon_llvm_finalize_module(struct 
radeon_llvm_context * ctx)
LLVMAddAggressiveDCEPass(gallivm-passmgr);
LLVMAddCFGSimplificationPass(gallivm-passmgr);
 
-   /* Run the passs */
+   /* Run the pass */
LLVMRunFunctionPassManager(gallivm-passmgr, ctx-main_fn);
 
LLVMDisposeBuilder(gallivm-builder);
-- 
1.8.3.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/3] r600/llvm: Free binary.code/binary.config in r600_llvm_compile

2013-11-07 Thread Aaron Watry
radeon_llvm_compile allocates memory for binary.code, binary.config, or neither 
depending on
what's being done.

We need to make sure to free that memory after it's no longer needed.
---
 src/gallium/drivers/r600/r600_llvm.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/src/gallium/drivers/r600/r600_llvm.c 
b/src/gallium/drivers/r600/r600_llvm.c
index f52ae84..084ba2a 100644
--- a/src/gallium/drivers/r600/r600_llvm.c
+++ b/src/gallium/drivers/r600/r600_llvm.c
@@ -745,6 +745,13 @@ unsigned r600_llvm_compile(
}
}
 
+   if (binary.code){
+   FREE(binary.code);
+   }
+   if (binary.config){
+   FREE(binary.config);
+   }
+
return r;
 }
 
-- 
1.8.3.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


  1   2   3   4   >