Hello, while testing various scenarios for multi-stream support in geometry shaders I came across one that I think might be a hardware bug, or at the very least, a hardware limitation that creates a problem to implement correct behavior according to ARB_transform_feedback3.
The conflictive scenario is activated with this setup: - Enable transform feedback. - Do not associate any varyings with one particular stream (let's call this stream X). - Have the GS emit a vertex to stream X. ARB_transform_feedback3 clarifies expected behavior in this case: "If the set of varyings selected for transform feedback does not include any belonging to the specified stream, nothing will be recorded when primitives are emitted to that stream, and the corresponding vertex count will be zero." However, we get two possible outcomes with this setup: 1) If the vertex emitted to that stream is not the last vertex emitted by the GS, then primitive count for that stream is incorrect (returns 0), but everything else works ok. I think this behavior is expected as per the IvyBridge documentation: "8.3 Stream Output Function: ... If a stream has no SO_DECL state defined (NumEntries is 0), incoming objects targeting that stream are effectively ignored. As there is no attempt to perform stream output, overflow detection is neither required nor performed." Which means that we can't use SO_PRIMITIVE_STORAGE_NEEDED for the primitive count in this case. We could still use CL_INVOCATION_COUNT for stream 0, but that would not fix the problem for other streams. 2) If the vertex emitted to that stream is the last vertex emitted by the GS, then transform feedback does not work for any stream (no values are recorded in the TF buffers) and primitive queries for all streams return 0. Rendering is okay though: stream 0 outputs are rendered properly and outputs from other streams are discarded. This, I think, is a hardware problem. With this setup, we are configuring the 3DSTATE_SO_DECL_LIST command for stream X like this: Buffer Selects (Stream X) = 0 Num Entries (Stream X) = 0 that is, that stream writes to no buffers and has no declarations to write, which is correct. Now comes the funny part: simply forcing Num Entries(Stream X) = 1, even if there are no declarations, makes TF and primitive queries work again for all streams but X, and for stream X, primitive count is ok, but TF is not (but that is kind of expected since we are not configuring it properly). More over, if I also force Buffer Selects (Stream X) = N (so that N is the index of a disabled TF buffer), then TF also works as expected for Stream X (primitives generated is okay, TF primitives written is 0, and no TF data for that stream is written). It looks like the hardware does not like setups where there are streams that have 0 varyings to record after all, even less so if the last vertex we emit is sent to such a stream. Based on the above, there is a work around for this but I think it is pretty ugly so I would like to know other people's thoughts on whether it is worth implementing. It would involve the following: In upload_3dstate_streamout() we make sure we disable all transform feedback buffers that are not going to record information (currently a TF buffer is activated as far as the user has called glBindBufferBase(GL_TRANSFORM_FEEDBACK_BUFFER, index bufferName)). We can know if a buffer is not going to be written by inspecting its BufferStride: it should be 0 for buffers that won't get written. I think this is probably good t do in any case. Then the ugly part: in gen7_upload_3dstate_so_decl_list(), if we detect a stream with no varyings bound to it (so num delcs is 0) *and* there are disabled TF buffers, we silently set num decls for that stream to 1 and set its buffer_mask to write to one of the disabled buffers (it won't actually write because they are disabled). I have a patch for this [1] and seems to fix the problem (although it only works as far as we have disabled TF buffers available). Opinions? Is there any other alternative to work around this issue? The problem is particularly annoying because I think it hits a very likely scenario: an application using stream 0 for rendering only (no TF) and using other streams to capture TF. Iago [1] Patch: diff --git a/src/mesa/drivers/dri/i965/gen7_sol_state.c b/src/mesa/drivers/dri/i965/gen7_sol_state.c index d2c3ae3..1450dde 100644 --- a/src/mesa/drivers/dri/i965/gen7_sol_state.c +++ b/src/mesa/drivers/dri/i965/gen7_sol_state.c @@ -189,6 +189,27 @@ gen7_upload_3dstate_so_decl_list(struct brw_context *brw, max_decls = decls[stream_id]; } + /* We need to inspect if we have streams for which we don't have any + * varyings to record. The hardware does not handle this scenario well + * and for TF to work in this case we need to configure such streams to + * have at least one decl and write to some disabled buffer. + */ + int disabled_buffer = -1; + for (int i = 0; i < 4; i++) { + if (linked_xfb_info->BufferStride[i] == 0) { + disabled_buffer = i; + break; + } + } + if (disabled_buffer >= 0) { + for (int i = 0; i < MAX_VERTEX_STREAMS; i++) { + if (decls[i] == 0) { + decls[i] = 1; + buffer_mask[i] = 1 << disabled_buffer; + } + } + } + BEGIN_BATCH(max_decls * 2 + 3); OUT_BATCH(_3DSTATE_SO_DECL_LIST << 16 | (max_decls * 2 + 1)); @@ -250,9 +271,10 @@ upload_3dstate_streamout(struct brw_context *brw, bool active, dw1 |= SO_REORDER_TRAILING; for (i = 0; i < 4; i++) { - if (xfb_obj->Buffers[i]) { - dw1 |= SO_BUFFER_ENABLE(i); - } + if (xfb_obj->Buffers[i] && + xfb_obj->shader_program->LinkedTransformFeedback.BufferStride[i] > 0) { + dw1 |= SO_BUFFER_ENABLE(i); + } } _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev