BTW: This patch is for byte/short's vector load/store. Can we also use untype
read/write to optimize scalar char/short load/store?
[ruiling]: as this needs very careful and annoying address alignment, I need
consider it further.
+ // split a DWORD register into unpacked Byte or Short register
+ static INLINE GenRegister splitReg(GenRegister reg, uint32_t count,
uint32_t sub_part) {
+ GenRegister r = reg;
+ GBE_ASSERT(count == 4 || count == 2);
+ if(reg.hstride != GEN_HORIZONTAL_STRIDE_0) {
+ r.hstride = count == 4 ? GEN_HORIZONTAL_STRIDE_4 :
+ GEN_HORIZONTAL_STRIDE_2;
>>>>>>>>>Do you suppose reg.hstide is GEN_HORIZONTAL_STRIDE_1 here? How about
>>>>>>>>>reg.hstide is GEN_HORIZONTAL_STRIDE_2 or GEN_HORIZONTAL_STRIDE_4 case?
[ruiling]: you are right, as splitReg does not consider all combination of
register settings, I will add some assert to prevent misuse.
+ }
+ if(count == 4) {
+ r.type = reg.type == GEN_TYPE_UD ? GEN_TYPE_UB : GEN_TYPE_B;
+ r.vstride = GEN_VERTICAL_STRIDE_32;
+ } else {
+ r.type = reg.type == GEN_TYPE_UD ? GEN_TYPE_UW : GEN_TYPE_W;
+ r.vstride = GEN_VERTICAL_STRIDE_16;
+ }
+
+ r.subnr += sub_part*typeSize(r.type);
+ r.nr += r.subnr / 32;
+ r.subnr %= 32;
+
>>>>>>>>>>>>If reg.hstride is GEN_HORIZONTAL_STRIDE_0, should not change r.nr
>>>>>>>>>>>>and r.subnr here.
[ruiling]: here I want to get the sub-byte register, like one dword register is
composed of [B0 B1 B2 B3], sub_part varies from [0-3] means I want to get
B[0-3], so the subnr need to change according to sub_part event it is
horizontal_stride_0.
_______________________________________________
Beignet mailing list
[email protected]
http://lists.freedesktop.org/mailman/listinfo/beignet
_______________________________________________
Beignet mailing list
[email protected]
http://lists.freedesktop.org/mailman/listinfo/beignet