Thanks for the patch. It solves a long standing issue. A few comments
as below:

On Fri, Oct 10, 2014 at 03:01:26PM +0800, Ruiling Song wrote:
> This legalize pass will break wider integers like i128/i256/... into shorter 
> ones.
> The problem is how to choose the shorter type? From my observation,
> wide integer type always comes from shorter ones through 'zext' on small type
> or 'bitcast' on vectors, so we simply choose the type where it comes from.
> Then we can split wide integer operations into operations on shorter interger.
> 
> Signed-off-by: Ruiling Song <[email protected]>
> ---
> +
> +  void splitAPInt(APInt &data, SmallVectorImpl<APInt> &result, int 
> totalBits, int subBits) {
> +    APInt lo = data.getLoBits(totalBits/2).trunc(totalBits/2);
> +    APInt hi = data.getHiBits(totalBits/2).trunc(totalBits/2);
> +
> +    if (totalBits/2 <= subBits) {
> +      result.push_back(lo);
> +      result.push_back(hi);
> +      return;
> +    }
> +    splitAPInt(lo, result, totalBits/2, subBits);
> +    splitAPInt(hi, result, totalBits/2, subBits);
> +  }
> +
> +  void Legalize::splitLargeInteger(APInt data, Type *splitTy, 
> SmallVector<APInt, 16> &split) {
> +    unsigned opSz = data.getBitWidth();
> +    unsigned subSz = splitTy->getPrimitiveSizeInBits();
> +    splitAPInt(data, split, opSz, subSz);
> +  }
The above algorithm assumes the opSz is an power of 2 bits, right? maybe put an 
assert here to
make sure this assumption is better.
> +
> +  void Legalize::legalizeICmp(IRBuilder<> &Builder, Instruction *p) {
> +    ICmpInst *IC = dyn_cast<ICmpInst>(p);
> +    ICmpInst::Predicate pred = IC->getPredicate();
> +    // I could not figure out why llvm could generate some
> +    // compare instruction on large integers. so here only support equality 
> check
Agree with your thought here. An assert here is good enough.

> +    GBE_ASSERT(IC->isEquality());
> +    Value *op0 = p->getOperand(0);
> +    Value *op1 = p->getOperand(1);
> +
> +    if (isa<ConstantInt>(op0)) {
> +      op0 = p->getOperand(1);
> +      op1 = p->getOperand(0);
> +    }
Is it possible that both operands are constant? If it is the case, we just need 
to
ignore it here.

> +  void Legalize::legalizeAnd(IRBuilder<> &Builder, Instruction *p) {
> +    Value *op0 = p->getOperand(0);
> +    Value *op1 = p->getOperand(1);
> +
> +    if ((isa<UndefValue>(op0) || isa<UndefValue>(op1))) {
> +      // I meet some special case as below:
> +      //   %82 = zext i32 %81 to i512
> +      //   %mask148 = and i512 undef, -4294967296
> +      //   %ins149 = or i512 %mask148, %82
> +      // I don't how to split this kind of i512 instruction in a good way,
                   miss a know here? 
And yes, I also found some similar cases, and some of the root causes are
a buggy kernel which use uninitialized value. But we still need to compile
those buggy kernel. My question here is we may need to add this logic to most
of the other operands handlers? This special case may not be And only case.

> +
> +  bool Legalize::legalizeFunction(Function &F) {
> +    bool changed = false;
> +    for (Function::iterator bb = F.begin(), bbE = F.end(); bb != bbE; ++bb) {
> +      IRBuilder<> Builder(bb);
> +      for (BasicBlock::iterator it = bb->begin(), itE = bb->end(); it != 
> itE; ++it) {
> +        Instruction *insn = it;
> +        Type *ty = insn->getType();
> +        if(ty->isIntegerTy() && ty->getIntegerBitWidth() > 64) {
> +          // result is large integer, push back itself and its users
> +          changed = true;
> +
> +          processed.insert(insn);
> +
> +          for(Value::use_iterator iter = insn->use_begin(); iter != 
> insn->use_end(); ++iter) {
> +            // After LLVM 3.5, use_iterator points to 'Use' instead of 
> 'User', which is more straightforward.
> +          #if (LLVM_VERSION_MAJOR == 3) && (LLVM_VERSION_MINOR < 5)
> +            User *theUser = *iter;
> +          #else
> +            User *theUser = iter->getUser();
> +          #endif
> +            processed.insert(theUser);
> +          }
> +        }
> +
> +        if(processed.empty() || processed.find(insn) == processed.end())
> +          continue;
> +
The above code has an implict assumption that all the wide values should be 
generated before
all the uses. Right?
This assumption may not be correct. Please check the following example:

    ... (there is no definition of %7)
      br L2

    L1:
      %10 = add i128 %7, %6
      ...
      ret

    L2:
      %8 = load <4 x i32> addrspace(1)* %3, align 4, !tbaa !1
      %7 = bitcast <4 x i32> %8 to i128
      ...
      br L1

Thanks,
Zhigang Gong.
_______________________________________________
Beignet mailing list
[email protected]
http://lists.freedesktop.org/mailman/listinfo/beignet

Reply via email to