BTW: This patch is for byte/short's vector load/store. Can we also use untype 
read/write to optimize scalar char/short load/store?
[ruiling]: as this needs very careful and annoying address alignment, I need 
consider it further.

+    // split a DWORD register into unpacked Byte or Short register
+    static INLINE GenRegister splitReg(GenRegister reg, uint32_t count, 
uint32_t sub_part) {
+      GenRegister r = reg;
+      GBE_ASSERT(count == 4 || count == 2);
+      if(reg.hstride != GEN_HORIZONTAL_STRIDE_0) {
+        r.hstride = count == 4 ? GEN_HORIZONTAL_STRIDE_4 : 
+ GEN_HORIZONTAL_STRIDE_2;

>>>>>>>>>Do you suppose reg.hstide is GEN_HORIZONTAL_STRIDE_1 here? How about 
>>>>>>>>>reg.hstide is GEN_HORIZONTAL_STRIDE_2 or GEN_HORIZONTAL_STRIDE_4 case?
[ruiling]: you are right, as splitReg does not consider all combination of 
register settings, I will add some assert to prevent misuse.


+      }
+      if(count == 4) {
+        r.type = reg.type == GEN_TYPE_UD ? GEN_TYPE_UB : GEN_TYPE_B;
+        r.vstride = GEN_VERTICAL_STRIDE_32;
+      } else {
+        r.type = reg.type == GEN_TYPE_UD ? GEN_TYPE_UW : GEN_TYPE_W;
+        r.vstride = GEN_VERTICAL_STRIDE_16;
+      }
+

+      r.subnr += sub_part*typeSize(r.type);
+      r.nr += r.subnr / 32;
+      r.subnr %= 32;
+
>>>>>>>>>>>>If reg.hstride is GEN_HORIZONTAL_STRIDE_0, should not change r.nr 
>>>>>>>>>>>>and r.subnr here.
[ruiling]: here I want to get the sub-byte register, like one dword register is 
composed of [B0 B1 B2 B3], sub_part varies from [0-3] means I want to get 
B[0-3], so the subnr need to change according to sub_part event it is 
horizontal_stride_0.


_______________________________________________
Beignet mailing list
[email protected]
http://lists.freedesktop.org/mailman/listinfo/beignet
_______________________________________________
Beignet mailing list
[email protected]
http://lists.freedesktop.org/mailman/listinfo/beignet

Reply via email to