from:"Alyssa Rosenzweig"

> Since the majority opinion seemed to be "if someone wanted to use it
> in a leaf node without making everyone use it, that's fine", I've
> started trying to put together the CI bits necessary to enable it.
> Currently fighting with meson cross files a bit, but the linting works
> and the amd64 build works.
> 
> https://gitlab.freedesktop.org/anholt/mesa/-/commits/ci-rust

Good luck :)

I didn't think to enforce a lint during CI. Part of me wonders if we
should be doing that for C too, but we can shave that yak sometime else.



signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] Rust drivers in Mesa

> > I think it's just going to get more messy and complicated for people who 
> > don't want to learn or use another language. Mesa already requires people 
> > to know C, Python, and now newly Gitlab CI scripts just to get stuff done 
> > and merged. Another language would only exacerbate the issue and steepen 
> > the learning curve.
> 
> To some extent I agree, and I think that's a good reason to start with
> Rust in a leaf-node of the project (a driver or compiler backend)
> rather than in a common piece of infrastructure.

Indeed, at least to probe the waters I'm interested in a backend
compiler, which necessarily only touches a few people and a few
architectures (fewer cross-platform issues. Mali is only* ever
paired with ARM CPUs, which have a good Rust story).

* We don't talk about SoFIA


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] Rust drivers in Mesa

> I have found that other tools like RAII/drop, the closely related smart
> pointer types, and safe containers (vectors, strings etc.) even without
> the borrow checker niceties, to be relatively more useful in preventing
> memory errors. However, these are features that modern C++ also offers,
> along with a seamless integration story with existing C and C++ code. I
> find that Rust has an edge in thread-safety, but I am not sure if this
> is a strong selling point in the context of Mesa, where the current
> design seems to be well served (for now) by the traditional thread
> safety patterns.

That's fair... aesthetically I've found Rust's presentation of these
patterns to be "nicer" than C++'s (although I recognize modern C++ is a
different beast than what I once learned). Techincally you're right that
both language have evolved quite some overlap.

> As an aside, as much as I like the practicality and richness of the
> crate ecosystem, which is one of the strong points of Rust, I do have
> concerns about its current security model. For example, the crates are
> not signed and thus vulnerable to several plausible attacks, like
> compromised github accounts or, even worse due to scale, compromised
> crate repositories. This is further aggravated by the sometimes large
> indirect dependency trees. Such concerns are especially relevant to Mesa
> (and other high-profile projects) since it could be an attractive target
> for malicious entities.  As enticing as it could be sometimes, I would
> be very hesitant to introduce external crate dependencies at this point
> in Mesa.

As mentioned, Rust with nothing more than the standard library is still
richer than what C gives us out-of-the-box. I don't expect to need
external crates flying around, at least for backend compiler purposes.


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] Rust drivers in Mesa

> Yep. Before we can land a single bit of code, we need to do a bunch of
> annoying things like build-system integration, FFI bridging, agreeing
> on conventions and style, etc etc. Trying to do that whilst also
> replacing the GLSL compiler or vtn is way too much; it's 100% doomed
> to failure, even if they're the highest-value targets on some axes.

For some data points, Eric started working on build system integration
(see below email). FFI will depend greatly on the component in question.
For a greenfield backend compiler that does NIR -> backed IR in C and
exposes the backend IR builders from Rust over a C ABI, there's little
in the way of FFI work needed to go. Conventions and style are easy --
go with rustfmt, the language is quite opinionated so we can quit
quibbling and get coding :)


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] Rust drivers in Mesa

> drive-by comment: for something like a gl driver that a lot of other
> things depend on, making it harder for us to depend on other external
> things is actually a good thing

I agree with this as well. The Rust standard library is richer than C's,
if we can get by fine with C + util/, that should be good enough for
Rust as well. We might end up growing a Rust-y version of util/ when we
have multiple Rust pieces in-tree, but meson should be ok with that.


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] Rust drivers in Mesa

2020-10-04 Thread Alyssa Rosenzweig

Cc'd.

On Sun, Oct 04, 2020 at 03:17:28PM +0300, Michael Shigorin wrote:
>   Hello,
> regarding this proposal:
> http://lists.freedesktop.org/archives/mesa-dev/2020-October/224639.html
> 
> Alyssa, Rust is not "naturally fit for graphics driver
> development" since it's not as universally available as
> C/C++ in terms of mature compilers; you're basically
> trying to put a cart before a horse, which will just
> put more pressure on both Rust developers team *and*
> on those of us working on non-x86 arches capable of
> driving modern GPUs with Mesa.
> 
> For one, I'm porting ALT Linux onto e2k platform
> (there's only an early non-optimizing version of
> Rust port there by now), and we're maintaining repos
> for aarch64, armv7hf, ppc64el, mipsel, and riscv64 either
> -- none of which seem to be described as better than
> "experimental" at http://doc.rust-lang.org/core/arch page
> in terms of Rust support.
> 
> And there's another problem: no kind of tooling helps
> careless developers avoid mistakes as many previous
> silver bullet candidates have shown already; some of
> those only made things actually worse.
> 
> I am no Mesa developer, my closest contribution to it
> is probably an LLVM patch proposal.  But I've seen
> enough deterioration brought with the best intents
> of course -- both in free software during last ten
> years or so and in real life (I used to live in Kiev
> but not anymore, Denis can witness that if needed).
> 
> PS: feel free to CC: mesa-dev@, of course.
> 
> -- 
>   WBR, Michael Shigorin / http://altlinux.org
>   -- http://opennet.ru / http://anna-news.info
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] Rust drivers in Mesa

2020-10-01 Thread Alyssa Rosenzweig

Hi all,

Recently I've been thinking about the potential for the Rust programming
language in Mesa. Rust bills itself a safe system programming language
with comparable performance to C [0], which is a naturally fit for
graphics driver development.

Mesa today is written primarily in C, a notoriously low-level language,
with some components in C++. To handle the impedance mismatch, we've
built up a number of abstractions in-tree, including multiple ad hoc
code generators (GenXML, NIR algebraic passes, Bifrost disassembler). A
higher level language can help avoid the web of metaprogramming and
effect code that is simpler and easier to reason about. Similarly, a
better type system can aid static analysis.

Beyond abstraction, Rust's differentiating feature is the borrow checker
to guarantee memory safety. Historically, safety has not been a primary
concern of graphics drivers, since drivers are implemented as regular
userspace code running in the process of the application calling them.
Unfortunately, now that OpenGL is being exposed to untrusted code via
WebGL, the driver does become an attack vector.

For the time being, Mesa attempts to minimize memory bugs with defensive
programming, safe in-tree abstractions (including ralloc), and static
analysis via Coverity. Nevertheless, these are all heuristic solutions.
Static analysis is imperfect and in our case, proprietary software.
Ideally, the bugs we've been fixing via Coverity could be caught at
compile-time with a free and open source toolchain.

As Rust would allow exactly this, I see the primary benefit of Rust in
verifying correctness and robustness, rather than security concerns per
se. Indeed, safety guarantees do translate well beyond WebGL.

Practically, how would Rust fit in with our existing C codebase?
Obviously I'm not suggesting a rewrite of Mesa's more than 15 million
lines of C. Instead, I see value in introducing Rust in targeted parts
of the tree. In particular, I envision backend compilers written in part
in Rust. While creating an idiomatic Rust wrapper for NIR or Gallium
would be prohibitively costly for now, a backend compiler could be
written in Rust with IR builders exported for use of the NIR -> backend
IR translator written in C.

This would have minimal impact on the tree. Users that are not building
such a driver would be unaffected. For those who _are_ building Rust
code, the Rust compiler would be added as a build-time dependency and
the (statically linked) Rust standard library would be added as a
runtime dependency. There is concern about the Rust compiler requiring
LLVM as a dependency, but again this is build-time, and no worse than
Mesa already requiring LLVM as a runtime dependency for llvmpipe and
clover. As for the standard library, it is possible to eliminate the
dependency as embedded Rust does, perhaps calling out to the C standard
library via the FFI, but this is likely quixotic. I do regret the binary
size increase, however.

Implications for the build system vary. Rust prefers to be built by its
own package manager, Cargo, which is tricky to integrate with other
build systems. Actually, Meson has native support for Rust, invoking the
compiler directly and skipping Cargo, as if it were C code. This support
is not widely adopted as it prevents linking with external libraries
("crates", in Rust parlance), with discussions between Rust and Meson
developers ending in a stand-still [1]. For Mesa, this might be just
fine. Our out-of-tree run-time dependencies are minimal for the C code,
and Rust's standard library largely avoids the need for us to maintain a
Rust version of util/ in-tree. If this proves impractical in the
long-term, it is possible to integrate Cargo with Meson on our end [2].

One outstanding concern is build-time, which has been a notorious
growing pain for Rust due to both language design and LLVM itself [3],
although there is active work to improve both fronts [4][5]. I build
Mesa on my Arm laptop, so I suppose I'd be hurt more than many of us.
There's also awkward bootstrapping questions, but there is work here too
[6].

If this is of interest, please discuss. It's clear to me Rust is not
going away any time soon, and I see value in Mesa embracing the new
technology. I'd like to hear other Mesa developers' thoughts.

Thanks,

Alyssa

[0] https://www.rust-lang.org/
[1] https://github.com/mesonbuild/meson/issues/2173
[2] https://gitlab.gnome.org/GNOME/fractal/-/blob/master/meson.build
[3] https://pingcap.com/blog/rust-compilation-model-calamity/
[4]
https://blog.mozilla.org/nnethercote/2020/04/24/how-to-speed-up-the-rust-compiler-in-2020/
[5] https://github.com/bjorn3/rustc_codegen_cranelift
[6] https://github.com/thepowersgang/mrustc

signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 30/30] pan/midgard: Allow scheduling conditions with constants

Now that we have constant adjustment logic abstracted, we can do this
safely. Along with the csel inversion patch, this allows many more
common csel ops to inline their condition in the bundle.

Signed-off-by: Alyssa Rosenzweig 
---
 src/panfrost/midgard/midgard_schedule.c | 14 ++
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/src/panfrost/midgard/midgard_schedule.c 
b/src/panfrost/midgard/midgard_schedule.c
index 2d20b617d1a..a26ea05ee50 100644
--- a/src/panfrost/midgard/midgard_schedule.c
+++ b/src/panfrost/midgard/midgard_schedule.c
@@ -654,6 +654,7 @@ static unsigned
 mir_comparison_mobile(
 compiler_context *ctx,
 midgard_instruction **instructions,
+struct midgard_predicate *predicate,
 unsigned count,
 unsigned cond)
 {
@@ -676,9 +677,9 @@ mir_comparison_mobile(
 if 
(GET_CHANNEL_COUNT(alu_opcode_props[instructions[i]->alu.op].props))
 return ~0;
 
-/* TODO: moving conditionals with constants */
+/* Ensure it will fit with constants */
 
-if (instructions[i]->has_constants)
+if (!mir_adjust_constants(instructions[i], predicate, false))
 return ~0;
 
 /* Ensure it is written only once */
@@ -689,6 +690,10 @@ mir_comparison_mobile(
 ret = i;
 }
 
+/* Inject constants now that we are sure we want to */
+if (ret != ~0)
+mir_adjust_constants(instructions[ret], predicate, true);
+
 return ret;
 }
 
@@ -700,6 +705,7 @@ static midgard_instruction *
 mir_schedule_comparison(
 compiler_context *ctx,
 midgard_instruction **instructions,
+struct midgard_predicate *predicate,
 BITSET_WORD *worklist, unsigned count,
 unsigned cond, bool vector, unsigned swizzle,
 midgard_instruction *user)
@@ -707,7 +713,7 @@ mir_schedule_comparison(
 /* TODO: swizzle when scheduling */
 unsigned comp_i =
 (!vector && (swizzle == 0)) ?
-mir_comparison_mobile(ctx, instructions, count, cond) : ~0;
+mir_comparison_mobile(ctx, instructions, predicate, count, 
cond) : ~0;
 
 /* If we can, schedule the condition immediately */
 if ((comp_i != ~0) && BITSET_TEST(worklist, comp_i)) {
@@ -747,7 +753,7 @@ mir_schedule_condition(compiler_context *ctx,
 /* Grab the conditional instruction */
 
 midgard_instruction *cond = mir_schedule_comparison(
-ctx, instructions, worklist, count, 
last->src[condition_index],
+ctx, instructions, predicate, worklist, count, 
last->src[condition_index],
 vector, last->cond_swizzle, last);
 
 /* We have exclusive reign over this (possibly move) conditional
-- 
2.23.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 27/30] pan/midgard: Tightly pack 32-bit constants

If we can reuse constant slots from other instructions, we would like to
do so to include more instructions per bundle.

Signed-off-by: Alyssa Rosenzweig 
---
 src/panfrost/midgard/midgard_schedule.c | 129 +---
 1 file changed, 113 insertions(+), 16 deletions(-)

diff --git a/src/panfrost/midgard/midgard_schedule.c 
b/src/panfrost/midgard/midgard_schedule.c
index a35699893bd..2d20b617d1a 100644
--- a/src/panfrost/midgard/midgard_schedule.c
+++ b/src/panfrost/midgard/midgard_schedule.c
@@ -370,26 +370,123 @@ mir_adjust_constants(midgard_instruction *ins,
 if (!ins->has_constants)
 return true;
 
-/* TODO: Deduplicate; permit multiple constants within a bundle */
-
-if (destructive && !pred->constant_count) {
-if (ins->alu.reg_mode == midgard_reg_mode_16) {
-  /* TODO: Fix packing XXX */
-uint16_t *bundles = (uint16_t *) pred->constants;
-uint32_t *constants = (uint32_t *) ins->constants;
-
-/* Copy them wholesale */
-for (unsigned i = 0; i < 4; ++i)
-bundles[i] = constants[i];
-} else {
-memcpy(pred->constants, ins->constants, 16);
-}
+if (ins->alu.reg_mode == midgard_reg_mode_16) {
+/* TODO: 16-bit constant combining */
+if (pred->constant_count)
+return false;
+
+uint16_t *bundles = (uint16_t *) pred->constants;
+uint32_t *constants = (uint32_t *) ins->constants;
+
+/* Copy them wholesale */
+for (unsigned i = 0; i < 4; ++i)
+bundles[i] = constants[i];
 
 pred->constant_count = 16;
-return true;
+} else {
+/* Pack 32-bit constants */
+uint32_t *bundles = (uint32_t *) pred->constants;
+uint32_t *constants = (uint32_t *) ins->constants;
+unsigned r_constant = SSA_FIXED_REGISTER(REGISTER_CONSTANT);
+unsigned mask = mir_mask_of_read_components(ins, r_constant);
+
+/* First, check if it fits */
+unsigned count = DIV_ROUND_UP(pred->constant_count, 
sizeof(uint32_t));
+unsigned existing_count = count;
+
+for (unsigned i = 0; i < 4; ++i) {
+if (!(mask & (1 << i)))
+continue;
+
+bool ok = false;
+
+/* Look for existing constant */
+for (unsigned j = 0; j < existing_count; ++j) {
+if (bundles[j] == constants[i]) {
+ok = true;
+break;
+}
+}
+
+if (ok)
+continue;
+
+/* If the constant is new, check ourselves */
+for (unsigned j = 0; j < i; ++j) {
+if (constants[j] == constants[i]) {
+ok = true;
+break;
+}
+}
+
+if (ok)
+continue;
+
+/* Otherwise, this is a new constant */
+count++;
+}
+
+/* Check if we have space */
+if (count > 4)
+return false;
+
+/* If non-destructive, we're done */
+if (!destructive)
+return true;
+
+/* If destructive, let's copy in the new constants and adjust
+ * swizzles to pack it in. */
+
+uint32_t indices[4] = { 0 };
+
+/* Reset count */
+count = existing_count;
+
+for (unsigned i = 0; i < 4; ++i) {
+if (!(mask & (1 << i)))
+continue;
+
+uint32_t cons = constants[i];
+bool constant_found = false;
+
+/* Search for the constant */
+for (unsigned j = 0; j < count; ++j) {
+if (bundles[j] != cons)
+continue;
+
+/* We found it, reuse */
+indices[i] = j;
+constant_found = true;
+break;
+}
+
+if (constant_found)
+

[Mesa-dev] [PATCH 28/30] pan/midgard: Add mir_flip helper

Useful for various operations on both commutative and anticommutative
ops.

Signed-off-by: Alyssa Rosenzweig 
---
 src/panfrost/midgard/compiler.h   |  1 +
 src/panfrost/midgard/midgard_opt_invert.c | 13 +++--
 src/panfrost/midgard/mir.c| 17 +
 3 files changed, 21 insertions(+), 10 deletions(-)

diff --git a/src/panfrost/midgard/compiler.h b/src/panfrost/midgard/compiler.h
index cb7d53dd653..8bbf67aa4e7 100644
--- a/src/panfrost/midgard/compiler.h
+++ b/src/panfrost/midgard/compiler.h
@@ -524,6 +524,7 @@ bool mir_nontrivial_outmod(midgard_instruction *ins);
 
 void mir_insert_instruction_before_scheduled(compiler_context *ctx, 
midgard_block *block, midgard_instruction *tag, midgard_instruction ins);
 void mir_insert_instruction_after_scheduled(compiler_context *ctx, 
midgard_block *block, midgard_instruction *tag, midgard_instruction ins);
+void mir_flip(midgard_instruction *ins);
 
 /* MIR goodies */
 
diff --git a/src/panfrost/midgard/midgard_opt_invert.c 
b/src/panfrost/midgard/midgard_opt_invert.c
index 729169f9a4a..f846bca5bf9 100644
--- a/src/panfrost/midgard/midgard_opt_invert.c
+++ b/src/panfrost/midgard/midgard_opt_invert.c
@@ -256,16 +256,9 @@ midgard_opt_fuse_src_invert(compiler_context *ctx, 
midgard_block *block)
 if (both) {
 ins->alu.op = mir_demorgan_op(ins->alu.op);
 } else if (right || (left && !ins->has_inline_constant)) {
-if (left) {
-/* Commute */
-unsigned temp = ins->src[0];
-ins->src[0] = ins->src[1];
-ins->src[1] = temp;
-
-temp = ins->alu.src1;
-ins->alu.src1 = ins->alu.src2;
-ins->alu.src2 = temp;
-}
+/* Commute arguments */
+if (left)
+mir_flip(ins);
 
 ins->alu.op = mir_notright_op(ins->alu.op);
 } else if (left && ins->has_inline_constant) {
diff --git a/src/panfrost/midgard/mir.c b/src/panfrost/midgard/mir.c
index faeac16d18d..f02527ff219 100644
--- a/src/panfrost/midgard/mir.c
+++ b/src/panfrost/midgard/mir.c
@@ -527,3 +527,20 @@ mir_insert_instruction_after_scheduled(
 memcpy(bundles + after + 1, , sizeof(new));
 list_addtail([0]->link, 
_bundle_1->instructions[0]->link);
 }
+
+/* Flip the first-two arguments of a (binary) op. Currently ALU
+ * only, no known uses for ldst/tex */
+
+void
+mir_flip(midgard_instruction *ins)
+{
+unsigned temp = ins->src[0];
+ins->src[0] = ins->src[1];
+ins->src[1] = temp;
+
+assert(ins->type == TAG_ALU_4);
+
+temp = ins->alu.src1;
+ins->alu.src1 = ins->alu.src2;
+ins->alu.src2 = temp;
+}
-- 
2.23.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 25/30] pan/midgard: Allow 6 instructions per bundle

We never had a scheduler good enough to hit this case before! :)

Signed-off-by: Alyssa Rosenzweig 
---
 src/panfrost/midgard/compiler.h | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/src/panfrost/midgard/compiler.h b/src/panfrost/midgard/compiler.h
index cf943ea6995..cb7d53dd653 100644
--- a/src/panfrost/midgard/compiler.h
+++ b/src/panfrost/midgard/compiler.h
@@ -198,9 +198,10 @@ typedef struct midgard_bundle {
 /* Tag for the overall bundle */
 int tag;
 
-/* Instructions contained by the bundle */
+/* Instructions contained by the bundle. instruction_count <= 6 (vmul,
+ * sadd, vadd, smul, vlut, branch) */
 int instruction_count;
-midgard_instruction *instructions[5];
+midgard_instruction *instructions[6];
 
 /* Bundle-wide ALU configuration */
 int padding;
-- 
2.23.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 29/30] pan/midgard: Add csel invert optimization

Signed-off-by: Alyssa Rosenzweig 
---
 src/panfrost/midgard/compiler.h   |  1 +
 src/panfrost/midgard/midgard_compile.c|  1 +
 src/panfrost/midgard/midgard_opt_invert.c | 25 +++
 3 files changed, 27 insertions(+)

diff --git a/src/panfrost/midgard/compiler.h b/src/panfrost/midgard/compiler.h
index 8bbf67aa4e7..b0b5ba07143 100644
--- a/src/panfrost/midgard/compiler.h
+++ b/src/panfrost/midgard/compiler.h
@@ -654,6 +654,7 @@ void midgard_lower_invert(compiler_context *ctx, 
midgard_block *block);
 bool midgard_opt_not_propagate(compiler_context *ctx, midgard_block *block);
 bool midgard_opt_fuse_src_invert(compiler_context *ctx, midgard_block *block);
 bool midgard_opt_fuse_dest_invert(compiler_context *ctx, midgard_block *block);
+bool midgard_opt_csel_invert(compiler_context *ctx, midgard_block *block);
 bool midgard_opt_promote_fmov(compiler_context *ctx, midgard_block *block);
 
 #endif
diff --git a/src/panfrost/midgard/midgard_compile.c 
b/src/panfrost/midgard/midgard_compile.c
index f1f8b7ecb9a..857e6c70112 100644
--- a/src/panfrost/midgard/midgard_compile.c
+++ b/src/panfrost/midgard/midgard_compile.c
@@ -2512,6 +2512,7 @@ midgard_compile_shader_nir(struct midgard_screen *screen, 
nir_shader *nir, midga
 progress |= midgard_opt_not_propagate(ctx, block);
 progress |= midgard_opt_fuse_src_invert(ctx, block);
 progress |= midgard_opt_fuse_dest_invert(ctx, block);
+progress |= midgard_opt_csel_invert(ctx, block);
 }
 } while (progress);
 
diff --git a/src/panfrost/midgard/midgard_opt_invert.c 
b/src/panfrost/midgard/midgard_opt_invert.c
index f846bca5bf9..3a4c455877c 100644
--- a/src/panfrost/midgard/midgard_opt_invert.c
+++ b/src/panfrost/midgard/midgard_opt_invert.c
@@ -275,3 +275,28 @@ midgard_opt_fuse_src_invert(compiler_context *ctx, 
midgard_block *block)
 
 return progress;
 }
+
+/* Optimizes a .not away when used as the source of a conditional select:
+ *
+ * csel(a, b, c)  = { b if a, c if !a }
+ * csel(!a, b, c) = { b if !a, c if !(!a) } = { c if a, b if !a } = csel(a, c, 
b)
+ * csel(!a, b, c) = csel(a, c, b)
+ */
+
+bool
+midgard_opt_csel_invert(compiler_context *ctx, midgard_block *block)
+{
+bool progress = false;
+
+mir_foreach_instr_in_block_safe(block, ins) {
+if (ins->type != TAG_ALU_4) continue;
+if (!OP_IS_CSEL(ins->alu.op)) continue;
+if (!mir_single_use(ctx, ins->src[2])) continue;
+if (!mir_strip_inverted(ctx, ins->src[2])) continue;
+
+mir_flip(ins);
+progress |= true;
+}
+
+return progress;
+}
-- 
2.23.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 24/30] pan/midgard: Only one conditional per bundle allowed

There's no r32 to save ya after you use up r31 :)

Signed-off-by: Alyssa Rosenzweig 
---
 src/panfrost/midgard/midgard_schedule.c | 16 
 1 file changed, 16 insertions(+)

diff --git a/src/panfrost/midgard/midgard_schedule.c 
b/src/panfrost/midgard/midgard_schedule.c
index 5f271608a30..6dfe8d5a72e 100644
--- a/src/panfrost/midgard/midgard_schedule.c
+++ b/src/panfrost/midgard/midgard_schedule.c
@@ -335,6 +335,10 @@ struct midgard_predicate {
 
 /* Exclude this destination (if not ~0) */
 unsigned exclude;
+
+/* Don't schedule instructions consuming conditionals (since we already
+ * scheduled one). Excludes conditional branches and csel */
+bool no_cond;
 };
 
 /* For an instruction that can fit, adjust it to fit and update the constants
@@ -394,12 +398,14 @@ mir_choose_instruction(
 unsigned unit = predicate->unit;
 bool branch = alu && (unit == ALU_ENAB_BR_COMPACT);
 bool scalar = (unit != ~0) && (unit & UNITS_SCALAR);
+bool no_cond = predicate->no_cond;
 
 /* Iterate to find the best instruction satisfying the predicate */
 unsigned i;
 BITSET_WORD tmp;
 
 signed best_index = -1;
+bool best_conditional = false;
 
 /* Enforce a simple metric limiting distance to keep down register
  * pressure. TOOD: replace with liveness tracking for much better
@@ -434,11 +440,18 @@ mir_choose_instruction(
 if (alu && !mir_adjust_constants(instructions[i], predicate, 
false))
 continue;
 
+bool conditional = alu && !branch && 
OP_IS_CSEL(instructions[i]->alu.op);
+conditional |= (branch && !instructions[i]->prepacked_branch 
&& instructions[i]->branch.conditional);
+
+if (conditional && no_cond)
+continue;
+
 /* Simulate in-order scheduling */
 if ((signed) i < best_index)
 continue;
 
 best_index = i;
+best_conditional = conditional;
 }
 
 
@@ -455,6 +468,9 @@ mir_choose_instruction(
 
 if (alu)
 mir_adjust_constants(instructions[best_index], 
predicate, true);
+
+/* Once we schedule a conditional, we can't again */
+predicate->no_cond |= best_conditional;
 }
 
 return instructions[best_index];
-- 
2.23.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 23/30] pan/midgard: Schedule to smul/sadd

Signed-off-by: Alyssa Rosenzweig 
---
 src/panfrost/midgard/midgard_schedule.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/src/panfrost/midgard/midgard_schedule.c 
b/src/panfrost/midgard/midgard_schedule.c
index 094451ceb9d..5f271608a30 100644
--- a/src/panfrost/midgard/midgard_schedule.c
+++ b/src/panfrost/midgard/midgard_schedule.c
@@ -753,6 +753,8 @@ mir_schedule_alu(
 unreachable("Bad condition");
 }
 
+mir_choose_alu(, instructions, worklist, len, , 
UNIT_SMUL);
+
 if (!writeout)
 mir_choose_alu(, instructions, worklist, len, , 
UNIT_VLUT);
 
@@ -777,6 +779,9 @@ mir_schedule_alu(
 unreachable("Bad condition");
 }
 
+/* Stage 2, let's schedule sadd before vmul for writeout */
+mir_choose_alu(, instructions, worklist, len, , 
UNIT_SADD);
+
 /* Check if writeout reads its own register */
 bool bad_writeout = false;
 
-- 
2.23.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 20/30] pan/midgard: Use new scheduler

We still emit in-order but we switch to using the bundles created from
the new scheduler, which will allow greater flexibility and room for
out-of-order optimization.

Signed-off-by: Alyssa Rosenzweig 
---
 src/panfrost/midgard/compiler.h |   6 -
 src/panfrost/midgard/midgard_compile.c  | 103 +---
 src/panfrost/midgard/midgard_schedule.c | 699 +---
 3 files changed, 130 insertions(+), 678 deletions(-)

diff --git a/src/panfrost/midgard/compiler.h b/src/panfrost/midgard/compiler.h
index 60d5b9d0e20..cf943ea6995 100644
--- a/src/panfrost/midgard/compiler.h
+++ b/src/panfrost/midgard/compiler.h
@@ -97,12 +97,6 @@ typedef struct midgard_instruction {
 /* I.e. (1 << alu_bit) */
 int unit;
 
-/* When emitting bundle, should this instruction have a break forced
- * before it? Used for r31 writes which are valid only within a single
- * bundle and *need* to happen as early as possible... this is a hack,
- * TODO remove when we have a scheduler */
-bool precede_break;
-
 bool has_constants;
 uint32_t constants[4];
 uint16_t inline_constant;
diff --git a/src/panfrost/midgard/midgard_compile.c 
b/src/panfrost/midgard/midgard_compile.c
index 337f46f2e0d..f1f8b7ecb9a 100644
--- a/src/panfrost/midgard/midgard_compile.c
+++ b/src/panfrost/midgard/midgard_compile.c
@@ -617,82 +617,6 @@ nir_is_non_scalar_swizzle(nir_alu_src *src, unsigned 
nr_components)
 return false;
 }
 
-/* Midgard puts scalar conditionals in r31.w; move an arbitrary source (the
- * output of a conditional test) into that register */
-
-static void
-emit_condition(compiler_context *ctx, nir_src *src, bool for_branch, unsigned 
component)
-{
-int condition = nir_src_index(ctx, src);
-
-/* Source to swizzle the desired component into w */
-
-const midgard_vector_alu_src alu_src = {
-.swizzle = SWIZZLE(component, component, component, component),
-};
-
-/* There is no boolean move instruction. Instead, we simulate a move by
- * ANDing the condition with itself to get it into r31.w */
-
-midgard_instruction ins = {
-.type = TAG_ALU_4,
-
-/* We need to set the conditional as close as possible */
-.precede_break = true,
-.unit = for_branch ? UNIT_SMUL : UNIT_SADD,
-.mask = 1 << COMPONENT_W,
-.src = { condition, condition, ~0 },
-.dest = SSA_FIXED_REGISTER(31),
-
-.alu = {
-.op = midgard_alu_op_iand,
-.outmod = midgard_outmod_int_wrap,
-.reg_mode = midgard_reg_mode_32,
-.dest_override = midgard_dest_override_none,
-.src1 = vector_alu_srco_unsigned(alu_src),
-.src2 = vector_alu_srco_unsigned(alu_src)
-},
-};
-
-emit_mir_instruction(ctx, ins);
-}
-
-/* Or, for mixed conditions (with csel_v), here's a vector version using all of
- * r31 instead */
-
-static void
-emit_condition_mixed(compiler_context *ctx, nir_alu_src *src, unsigned nr_comp)
-{
-int condition = nir_src_index(ctx, >src);
-
-/* Source to swizzle the desired component into w */
-
-const midgard_vector_alu_src alu_src = {
-.swizzle = SWIZZLE_FROM_ARRAY(src->swizzle),
-};
-
-/* There is no boolean move instruction. Instead, we simulate a move by
- * ANDing the condition with itself to get it into r31.w */
-
-midgard_instruction ins = {
-.type = TAG_ALU_4,
-.precede_break = true,
-.mask = mask_of(nr_comp),
-.src = { condition, condition, ~0 },
-.dest = SSA_FIXED_REGISTER(31),
-.alu = {
-.op = midgard_alu_op_iand,
-.outmod = midgard_outmod_int_wrap,
-.reg_mode = midgard_reg_mode_32,
-.dest_override = midgard_dest_override_none,
-.src1 = vector_alu_srco_unsigned(alu_src),
-.src2 = vector_alu_srco_unsigned(alu_src)
-},
-};
-
-emit_mir_instruction(ctx, ins);
-}
-
 #define ALU_CASE(nir, _op) \
case nir_op_##nir: \
op = midgard_alu_op_##_op; \
@@ -980,21 +904,16 @@ emit_alu(compiler_context *ctx, nir_alu_instr *instr)
 bool mixed = nir_is_non_scalar_swizzle(>src[0], 
nr_components);
 op = mixed ? midgard_alu_op_icsel_v : midgard_alu_op_icsel;
 
-/* csel works as a two-arg in Midgard, since the condition is 
hardcoded in r31.w */
-nr_inputs = 2;
-
-/* Emit the condition into r31 */
-
-if (mixed)
-emit_condition_mixed(ctx, &g

[Mesa-dev] [PATCH 26/30] pan/midgard: Allow writeout to see into the future

If an instruction could be scheduled to vmul to satisfy the writeout
conditions, let's do that and save an instruction+cycle per fragment
shader.

Signed-off-by: Alyssa Rosenzweig 
---
 src/panfrost/midgard/midgard_schedule.c | 41 -
 1 file changed, 40 insertions(+), 1 deletion(-)

diff --git a/src/panfrost/midgard/midgard_schedule.c 
b/src/panfrost/midgard/midgard_schedule.c
index 6dfe8d5a72e..a35699893bd 100644
--- a/src/panfrost/midgard/midgard_schedule.c
+++ b/src/panfrost/midgard/midgard_schedule.c
@@ -339,6 +339,12 @@ struct midgard_predicate {
 /* Don't schedule instructions consuming conditionals (since we already
  * scheduled one). Excludes conditional branches and csel */
 bool no_cond;
+
+/* Require a minimal mask and (if nonzero) given destination. Used for
+ * writeout optimizations */
+
+unsigned mask;
+unsigned dest;
 };
 
 /* For an instruction that can fit, adjust it to fit and update the constants
@@ -400,6 +406,10 @@ mir_choose_instruction(
 bool scalar = (unit != ~0) && (unit & UNITS_SCALAR);
 bool no_cond = predicate->no_cond;
 
+unsigned mask = predicate->mask;
+unsigned dest = predicate->dest;
+bool needs_dest = mask & 0xF;
+
 /* Iterate to find the best instruction satisfying the predicate */
 unsigned i;
 BITSET_WORD tmp;
@@ -440,6 +450,12 @@ mir_choose_instruction(
 if (alu && !mir_adjust_constants(instructions[i], predicate, 
false))
 continue;
 
+if (needs_dest && instructions[i]->dest != dest)
+continue;
+
+if (mask && ((~instructions[i]->mask) & mask))
+continue;
+
 bool conditional = alu && !branch && 
OP_IS_CSEL(instructions[i]->alu.op);
 conditional |= (branch && !instructions[i]->prepacked_branch 
&& instructions[i]->branch.conditional);
 
@@ -817,7 +833,30 @@ mir_schedule_alu(
 bad_writeout |= mir_has_arg(stages[i], branch->src[0]);
 }
 
-/* Add a move if necessary */
+/* It's possible we'll be able to schedule something into vmul
+ * to fill r0. Let's peak into the future, trying to schedule
+ * vmul specially that way. */
+
+if (!bad_writeout && writeout_mask != 0xF) {
+predicate.unit = UNIT_VMUL;
+predicate.dest = src;
+predicate.mask = writeout_mask ^ 0xF;
+
+struct midgard_instruction *peaked =
+mir_choose_instruction(instructions, worklist, 
len, );
+
+if (peaked) {
+vmul = peaked;
+vmul->unit = UNIT_VMUL;
+writeout_mask |= predicate.mask;
+assert(writeout_mask == 0xF);
+}
+
+/* Cleanup */
+predicate.dest = predicate.mask = 0;
+}
+
+/* Finally, add a move if necessary */
 if (bad_writeout || writeout_mask != 0xF) {
 unsigned temp = (branch->src[0] == ~0) ? 
SSA_FIXED_REGISTER(0) : make_compiler_temp(ctx);
 midgard_instruction mov = v_mov(src, blank_alu_src, 
temp);
-- 
2.23.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 17/30] pan/midgard: Implement load/store pairing

We can bundle two load/store together. This eliminates the need for
explicit load/store pairing in a prepass, as well.

Signed-off-by: Alyssa Rosenzweig 
---
 src/panfrost/midgard/midgard_schedule.c | 67 +
 1 file changed, 12 insertions(+), 55 deletions(-)

diff --git a/src/panfrost/midgard/midgard_schedule.c 
b/src/panfrost/midgard/midgard_schedule.c
index 92756e15189..e2641ea0180 100644
--- a/src/panfrost/midgard/midgard_schedule.c
+++ b/src/panfrost/midgard/midgard_schedule.c
@@ -1141,17 +1141,26 @@ mir_schedule_ldst(
 .exclude = ~0
 };
 
+/* Try to pick two load/store ops. Second not gauranteed to exist */
+
 midgard_instruction *ins =
 mir_choose_instruction(instructions, worklist, len, 
);
 
-mir_update_worklist(worklist, len, instructions, ins);
+midgard_instruction *pair =
+mir_choose_instruction(instructions, worklist, len, 
);
 
 struct midgard_bundle out = {
 .tag = TAG_LOAD_STORE_4,
-.instruction_count = 1,
-.instructions = { ins }
+.instruction_count = pair ? 2 : 1,
+.instructions = { ins, pair }
 };
 
+/* We have to update the worklist atomically, since the two
+ * instructions run concurrently (TODO: verify it's not pipelined) */
+
+mir_update_worklist(worklist, len, instructions, ins);
+mir_update_worklist(worklist, len, instructions, pair);
+
 return out;
 }
 
@@ -1341,54 +1350,6 @@ schedule_block(compiler_context *ctx, midgard_block 
*block)
 ctx->quadword_count += block->quadword_count;
 }
 
-/* The following passes reorder MIR instructions to enable better scheduling */
-
-static void
-midgard_pair_load_store(compiler_context *ctx, midgard_block *block)
-{
-mir_foreach_instr_in_block_safe(block, ins) {
-if (ins->type != TAG_LOAD_STORE_4) continue;
-
-/* We've found a load/store op. Check if next is also 
load/store. */
-midgard_instruction *next_op = mir_next_op(ins);
-if (_op->link != >instructions) {
-if (next_op->type == TAG_LOAD_STORE_4) {
-/* If so, we're done since we're a pair */
-ins = mir_next_op(ins);
-continue;
-}
-
-/* Maximum search distance to pair, to avoid register 
pressure disasters */
-int search_distance = 8;
-
-/* Otherwise, we have an orphaned load/store -- search 
for another load */
-mir_foreach_instr_in_block_from(block, c, 
mir_next_op(ins)) {
-/* Terminate search if necessary */
-if (!(search_distance--)) break;
-
-if (c->type != TAG_LOAD_STORE_4) continue;
-
-/* We can only reorder if there are no sources 
*/
-
-bool deps = false;
-
-for (unsigned s = 0; s < ARRAY_SIZE(ins->src); 
++s)
-deps |= (c->src[s] != ~0);
-
-if (deps)
-continue;
-
-/* We found one! Move it up to pair and remove 
it from the old location */
-
-mir_insert_instruction_before(ctx, ins, *c);
-mir_remove_instruction(c);
-
-break;
-}
-}
-}
-}
-
 /* When we're 'squeezing down' the values in the IR, we maintain a hash
  * as such */
 
@@ -1665,10 +1626,6 @@ schedule_program(compiler_context *ctx)
 
 midgard_promote_uniforms(ctx, 16);
 
-mir_foreach_block(ctx, block) {
-midgard_pair_load_store(ctx, block);
-}
-
 /* Must be lowered right before RA */
 mir_squeeze_index(ctx);
 mir_lower_special_reads(ctx);
-- 
2.23.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 21/30] pan/midgard: Don't double check SCALAR units

Signed-off-by: Alyssa Rosenzweig 
---
 src/panfrost/midgard/midgard_schedule.c | 4 
 1 file changed, 4 deletions(-)

diff --git a/src/panfrost/midgard/midgard_schedule.c 
b/src/panfrost/midgard/midgard_schedule.c
index f883830cf86..e71f65f6004 100644
--- a/src/panfrost/midgard/midgard_schedule.c
+++ b/src/panfrost/midgard/midgard_schedule.c
@@ -188,10 +188,6 @@ is_single_component_mask(unsigned mask)
 static bool
 mir_is_scalar(midgard_instruction *ains)
 {
-/* Does the op support scalar units? */
-if (!(alu_opcode_props[ains->alu.op].props & UNITS_SCALAR))
-return false;
-
 /* Do we try to use it as a vector op? */
 if (!is_single_component_mask(ains->mask))
 return false;
-- 
2.23.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 16/30] pan/midgard: Extend csel_swizzle to branches

Conditions for branches don't have a swizzle explicitly in the emitted
binary, but they do implicitly get swizzled in whatever instruction
wrote r31, so we need to handle that.

Signed-off-by: Alyssa Rosenzweig 
---
 src/panfrost/midgard/compiler.h| 4 ++--
 src/panfrost/midgard/midgard_compile.c | 2 +-
 src/panfrost/midgard/mir.c | 9 +++--
 3 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/src/panfrost/midgard/compiler.h b/src/panfrost/midgard/compiler.h
index 780e4323554..60d5b9d0e20 100644
--- a/src/panfrost/midgard/compiler.h
+++ b/src/panfrost/midgard/compiler.h
@@ -88,8 +88,8 @@ typedef struct midgard_instruction {
 unsigned src[3];
 unsigned dest;
 
-/* Swizzle for the conditional for a csel */
-unsigned csel_swizzle;
+/* Swizzle for the conditional for a csel/branch */
+unsigned cond_swizzle;
 
 /* Special fields for an ALU instruction */
 midgard_reg_info registers;
diff --git a/src/panfrost/midgard/midgard_compile.c 
b/src/panfrost/midgard/midgard_compile.c
index 95ec48e9563..337f46f2e0d 100644
--- a/src/panfrost/midgard/midgard_compile.c
+++ b/src/panfrost/midgard/midgard_compile.c
@@ -1095,7 +1095,7 @@ emit_alu(compiler_context *ctx, nir_alu_instr *instr)
 };
 
 if (nr_inputs == 3) {
-ins.csel_swizzle = SWIZZLE_FROM_ARRAY(nirmods[2]->swizzle);
+ins.cond_swizzle = SWIZZLE_FROM_ARRAY(nirmods[2]->swizzle);
 assert(!nirmods[2]->abs);
 assert(!nirmods[2]->negate);
 }
diff --git a/src/panfrost/midgard/mir.c b/src/panfrost/midgard/mir.c
index 9e5ba7abcb0..faeac16d18d 100644
--- a/src/panfrost/midgard/mir.c
+++ b/src/panfrost/midgard/mir.c
@@ -42,8 +42,8 @@ unsigned
 mir_get_swizzle(midgard_instruction *ins, unsigned idx)
 {
 if (ins->type == TAG_ALU_4) {
-if (idx == 2)
-return ins->csel_swizzle;
+if (idx == 2 || ins->compact_branch)
+return ins->cond_swizzle;
 
 unsigned b = (idx == 0) ? ins->alu.src1 : ins->alu.src2;
 
@@ -105,6 +105,11 @@ void
 mir_set_swizzle(midgard_instruction *ins, unsigned idx, unsigned new)
 {
 if (ins->type == TAG_ALU_4) {
+if (idx == 2 || ins->compact_branch) {
+ins->cond_swizzle = new;
+return;
+}
+
 unsigned b = (idx == 0) ? ins->alu.src1 : ins->alu.src2;
 
 midgard_vector_alu_src s =
-- 
2.23.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 18/30] pan/midgard: Add mir_choose_alu helper

Based on a given unit.

Signed-off-by: Alyssa Rosenzweig 
---
 src/panfrost/midgard/midgard_schedule.c | 24 
 1 file changed, 24 insertions(+)

diff --git a/src/panfrost/midgard/midgard_schedule.c 
b/src/panfrost/midgard/midgard_schedule.c
index e2641ea0180..dea8b023e9d 100644
--- a/src/panfrost/midgard/midgard_schedule.c
+++ b/src/panfrost/midgard/midgard_schedule.c
@@ -818,6 +818,9 @@ struct midgard_predicate {
 /* True if we want to pop off the chosen instruction */
 bool destructive;
 
+/* For ALU, choose only this unit */
+unsigned unit;
+
 /* State for bundle constants. constants is the actual constants
  * for the bundle. constant_count is the number of bytes (up to
  * 16) currently in use for constants. When picking in destructive
@@ -957,6 +960,27 @@ mir_choose_bundle(
 return ~0;
 }
 
+/* We want to choose an ALU instruction filling a given unit */
+static void
+mir_choose_alu(midgard_instruction **slot,
+midgard_instruction **instructions,
+BITSET_WORD *worklist, unsigned len,
+struct midgard_predicate *predicate,
+unsigned unit)
+{
+/* Did we already schedule to this slot? */
+if ((*slot) != NULL)
+return;
+
+/* Try to schedule something, if not */
+predicate->unit = unit;
+*slot = mir_choose_instruction(instructions, worklist, len, predicate);
+
+/* Store unit upon scheduling */
+if (*slot && !((*slot)->compact_branch))
+(*slot)->unit = unit;
+}
+
 /* When we are scheduling a branch/csel, we need the consumed condition in the
  * same block as a pipeline register. There are two options to enable this:
  *
-- 
2.23.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 05/30] pan/midgard: Calculate dependency graph

Signed-off-by: Alyssa Rosenzweig 
---
 src/panfrost/midgard/compiler.h |  10 ++
 src/panfrost/midgard/midgard_schedule.c | 121 
 2 files changed, 131 insertions(+)

diff --git a/src/panfrost/midgard/compiler.h b/src/panfrost/midgard/compiler.h
index 8612bab7686..780e4323554 100644
--- a/src/panfrost/midgard/compiler.h
+++ b/src/panfrost/midgard/compiler.h
@@ -136,6 +136,16 @@ typedef struct midgard_instruction {
 /* Generic hint for intra-pass use */
 bool hint;
 
+/* During scheduling, the backwards dependency graph
+ * (DAG). nr_dependencies is the number of unscheduled
+ * instructions that must still be scheduled after
+ * (before) this instruction. dependents are which
+ * instructions need to be scheduled before (after) this
+ * instruction. */
+
+unsigned nr_dependencies;
+BITSET_WORD *dependents;
+
 union {
 midgard_load_store_word load_store;
 midgard_vector_alu alu;
diff --git a/src/panfrost/midgard/midgard_schedule.c 
b/src/panfrost/midgard/midgard_schedule.c
index 75295b5d123..70fa030390c 100644
--- a/src/panfrost/midgard/midgard_schedule.c
+++ b/src/panfrost/midgard/midgard_schedule.c
@@ -55,6 +55,119 @@
  *
  */
 
+/* We create the dependency graph with per-component granularity */
+
+#define COMPONENT_COUNT 8
+
+static void
+add_dependency(struct util_dynarray *table, unsigned index, unsigned mask, 
midgard_instruction **instructions, unsigned child)
+{
+for (unsigned i = 0; i < COMPONENT_COUNT; ++i) {
+if (!(mask & (1 << i)))
+continue;
+
+struct util_dynarray *parents = [(COMPONENT_COUNT * 
index) + i];
+
+util_dynarray_foreach(parents, unsigned, parent) {
+BITSET_WORD *dependents = 
instructions[*parent]->dependents;
+
+/* Already have the dependency */
+if (BITSET_TEST(dependents, child))
+continue;
+
+BITSET_SET(dependents, child);
+instructions[child]->nr_dependencies++;
+}
+}
+}
+
+static void
+mark_access(struct util_dynarray *table, unsigned index, unsigned mask, 
unsigned parent)
+{
+for (unsigned i = 0; i < COMPONENT_COUNT; ++i) {
+if (!(mask & (1 << i)))
+continue;
+
+util_dynarray_append([(COMPONENT_COUNT * index) + i], 
unsigned, parent);
+}
+}
+
+static void
+mir_create_dependency_graph(midgard_instruction **instructions, unsigned 
count, unsigned node_count)
+{
+size_t sz = node_count * COMPONENT_COUNT;
+
+struct util_dynarray *last_read = calloc(sizeof(struct util_dynarray), 
sz);
+struct util_dynarray *last_write = calloc(sizeof(struct 
util_dynarray), sz);
+
+for (unsigned i = 0; i < sz; ++i) {
+util_dynarray_init(_read[i], NULL);
+util_dynarray_init(_write[i], NULL);
+}
+
+/* Initialize dependency graph */
+for (unsigned i = 0; i < count; ++i) {
+instructions[i]->dependents =
+calloc(BITSET_WORDS(count), sizeof(BITSET_WORD));
+
+instructions[i]->nr_dependencies = 0;
+}
+
+/* Populate dependency graph */
+for (signed i = count - 1; i >= 0; --i) {
+if (instructions[i]->compact_branch)
+continue;
+
+unsigned dest = instructions[i]->dest;
+unsigned mask = instructions[i]->mask;
+
+mir_foreach_src((*instructions), s) {
+unsigned src = instructions[i]->src[s];
+
+if (src < node_count) {
+unsigned readmask = 
mir_mask_of_read_components(instructions[i], src);
+add_dependency(last_write, src, readmask, 
instructions, i);
+}
+}
+
+if (dest < node_count) {
+add_dependency(last_read, dest, mask, instructions, i);
+add_dependency(last_write, dest, mask, instructions, 
i);
+mark_access(last_write, dest, mask, i);
+}
+
+mir_foreach_src((*instructions), s) {
+unsigned src = instructions[i]->src[s];
+
+if (src < node_count) {
+unsigned readmask = 
mir_mask_of_read_components(instructions[i], src);
+mark_access(last_read, src, readmask, i);
+}
+}
+}
+
+/* If there is a branch, all instructions depend on it, as interblock
+ * execution must be pu

[Mesa-dev] [PATCH 04/30] pan/midgard: Add flatten_mir helper

We would like to flatten a linked list of midgard_instructions into an
array of midgard_instruction pointers on the heap.

Signed-off-by: Alyssa Rosenzweig 
---
 src/panfrost/midgard/midgard_schedule.c | 22 ++
 1 file changed, 22 insertions(+)

diff --git a/src/panfrost/midgard/midgard_schedule.c 
b/src/panfrost/midgard/midgard_schedule.c
index 2ed25da2e0d..75295b5d123 100644
--- a/src/panfrost/midgard/midgard_schedule.c
+++ b/src/panfrost/midgard/midgard_schedule.c
@@ -620,6 +620,28 @@ schedule_bundle(compiler_context *ctx, midgard_block 
*block, midgard_instruction
 return bundle;
 }
 
+/* We would like to flatten the linked list of midgard_instructions in a bundle
+ * to an array of pointers on the heap for easy indexing */
+
+static midgard_instruction **
+flatten_mir(midgard_block *block, unsigned *len)
+{
+*len = list_length(>instructions);
+
+if (!(*len))
+return NULL;
+
+midgard_instruction **instructions =
+calloc(sizeof(midgard_instruction *), *len);
+
+unsigned i = 0;
+
+mir_foreach_instr_in_block(block, ins)
+instructions[i++] = ins;
+
+return instructions;
+}
+
 /* Schedule a single block by iterating its instruction to create bundles.
  * While we go, tally about the bundle sizes to compute the block size. */
 
-- 
2.23.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 00/30] pan/midgard: Implement out-of-order scheduler

For each block, we compute an instruction dependency graph, which allows
us to maintain a worklist of instructions able to be scheduled (no
dependencies). We traverse this graph backwards, emitting instructions
backwards and reversing the block at the end, to allow conditionals and
writeout to be expressed naturally and to enable liveness tracking in
the future.

At the moment, we do not schedule for register pressure; we just aim to
minimize bundles (register pressure is capped here by a distance
metric). Scheduling for register pressure will be the next step as soon
as this series is finalized; at that point, we'll augment
mir_choose_instruction and mir_choose_bundle with pressure-aware
heuristics.

Overall, I'm quite happy with how this came out. Full stats below.
Highlight is a 10% reduction of cycle count for glmark shaders.

total instructions in shared programs: 3500 -> 3472 (-0.80%)
instructions in affected programs: 685 -> 657 (-4.09%)
helped: 15
HURT: 1
helped stats (abs) min: 1 max: 5 x̄: 1.93 x̃: 1
helped stats (rel) min: 1.49% max: 12.20% x̄: 4.65% x̃: 3.45%
HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel)   min: 4.00% max: 4.00% x̄: 4.00% x̃: 4.00%
95% mean confidence interval for instructions value: -2.61 -0.89
95% mean confidence interval for instructions %-change: -6.03% -2.18%
Instructions are helped.

total bundles in shared programs: 2020 -> 1804 (-10.69%)
bundles in affected programs: 1343 -> 1127 (-16.08%)
helped: 45
HURT: 0
helped stats (abs) min: 1 max: 48 x̄: 4.80 x̃: 2
helped stats (rel) min: 5.56% max: 34.78% x̄: 13.50% x̃: 12.50%
95% mean confidence interval for bundles value: -7.15 -2.45
95% mean confidence interval for bundles %-change: -15.98% -11.03%
Bundles are helped.

total quadwords in shared programs: 3336 -> 3130 (-6.18%)
quadwords in affected programs: 2162 -> 1956 (-9.53%)
helped: 51
HURT: 0
helped stats (abs) min: 1 max: 17 x̄: 4.04 x̃: 2
helped stats (rel) min: 1.72% max: 22.92% x̄: 10.94% x̃: 11.11%
95% mean confidence interval for quadwords value: -5.32 -2.76
95% mean confidence interval for quadwords %-change: -12.47% -9.42%
Quadwords are helped.

total registers in shared programs: 343 -> 327 (-4.66%)
registers in affected programs: 123 -> 107 (-13.01%)
helped: 22
HURT: 5
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 12.50% max: 33.33% x̄: 25.61% x̃: 25.00%
HURT stats (abs)   min: 1 max: 2 x̄: 1.20 x̃: 1
HURT stats (rel)   min: 8.33% max: 100.00% x̄: 37.00% x̃: 20.00%
95% mean confidence interval for registers value: -0.94 -0.24
95% mean confidence interval for registers %-change: -25.65% -2.38%
Registers are helped.

total threads in shared programs: 451 -> 459 (1.77%)
threads in affected programs: 8 -> 16 (100.00%)
helped: 4
HURT: 0
helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2
helped stats (rel) min: 100.00% max: 100.00% x̄: 100.00% x̃: 100.00%
95% mean confidence interval for threads value: 2.00 2.00
95% mean confidence interval for threads %-change: 100.00% 100.00%

total spills in shared programs: 3 -> 3 (0.00%)
spills in affected programs: 0 -> 0
helped: 0
HURT: 0

total fills in shared programs: 5 -> 6 (20.00%)
fills in affected programs: 5 -> 6 (20.00%)
helped: 0
HURT: 1

Alyssa Rosenzweig (30):
  pan/midgard: Add missing parans in SWIZZLE definition
  pan/midgard: Fix component count handling for ldst
  pan/midgard: Squeeze indices before scheduling
  pan/midgard: Add flatten_mir helper
  pan/midgard: Calculate dependency graph
  pan/midgard: Initialize worklist
  pan/midgard: Add mir_choose_instruction stub
  pan/midgard: Add mir_update_worklist helper
  pan/midgard: Add mir_choose_bundle helper
  pan/midgard: Add mir_schedule_texture/ldst/alu helpers
  pan/midgard: Remove csel constant unit force
  pan/midgard: Add constant intersection filters
  pan/midgard: Add predicate->exclude
  pan/midgard: Implement predicate->unit
  pan/midgard: Add helpers for scheduling conditionals
  pan/midgard: Extend csel_swizzle to branches
  pan/midgard: Implement load/store pairing
  pan/midgard: Add mir_choose_alu helper
  pan/midgard: Add distance metric to choose_instruction
  pan/midgard: Use new scheduler
  pan/midgard: Don't double check SCALAR units
  pan/midgard: Extend choose_instruction for scalar units
  pan/midgard: Schedule to smul/sadd
  pan/midgard: Only one conditional per bundle allowed
  pan/midgard: Allow 6 instructions per bundle
  pan/midgard: Allow writeout to see into the future
  pan/midgard: Tightly pack 32-bit constants
  pan/midgard: Add mir_flip helper
  pan/midgard: Add csel invert optimization
  pan/midgard: Allow scheduling conditions with constants

 src/panfrost/midgard/compiler.h   |   27 +-
 src/panfrost/midgard/helpers.h|   17 +-
 src/panfrost/midgard/midgard_compile.c|  109 +-
 src/panfrost/midgard/midgard_opt_invert.c |   38 +-
 src/panfrost/midgard/midgard_schedule.c   | 1303 ++---
 src/panfrost

[Mesa-dev] [PATCH 22/30] pan/midgard: Extend choose_instruction for scalar units

Signed-off-by: Alyssa Rosenzweig 
---
 src/panfrost/midgard/midgard_schedule.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/src/panfrost/midgard/midgard_schedule.c 
b/src/panfrost/midgard/midgard_schedule.c
index e71f65f6004..094451ceb9d 100644
--- a/src/panfrost/midgard/midgard_schedule.c
+++ b/src/panfrost/midgard/midgard_schedule.c
@@ -393,6 +393,7 @@ mir_choose_instruction(
 bool alu = tag == TAG_ALU_4;
 unsigned unit = predicate->unit;
 bool branch = alu && (unit == ALU_ENAB_BR_COMPACT);
+bool scalar = (unit != ~0) && (unit & UNITS_SCALAR);
 
 /* Iterate to find the best instruction satisfying the predicate */
 unsigned i;
@@ -427,6 +428,9 @@ mir_choose_instruction(
 if (branch && !instructions[i]->compact_branch)
 continue;
 
+if (alu && scalar && !mir_is_scalar(instructions[i]))
+continue;
+
 if (alu && !mir_adjust_constants(instructions[i], predicate, 
false))
 continue;
 
-- 
2.23.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 02/30] pan/midgard: Fix component count handling for ldst

It's not based on the writemask and it can't be inferred; it's just
intrinsic to the op itself.

Signed-off-by: Alyssa Rosenzweig 
---
 src/panfrost/midgard/helpers.h | 15 +++--
 src/panfrost/midgard/mir.c | 59 ++
 2 files changed, 37 insertions(+), 37 deletions(-)

diff --git a/src/panfrost/midgard/helpers.h b/src/panfrost/midgard/helpers.h
index 343fad0fea8..024a6380d29 100644
--- a/src/panfrost/midgard/helpers.h
+++ b/src/panfrost/midgard/helpers.h
@@ -392,9 +392,20 @@ swizzle_to_component(unsigned swizzle)
 
 
 static inline unsigned
-component_to_swizzle(unsigned c)
+component_to_swizzle(unsigned c, unsigned count)
 {
-return SWIZZLE(c, c, c, c);
+switch (count) {
+case 1:
+return SWIZZLE(c, c, c, c);
+case 2:
+return SWIZZLE(c, c + 1, c + 1, c + 1);
+case 3:
+return SWIZZLE(c, c + 1, c + 2, c + 2);
+case 4:
+return SWIZZLE(c, c + 1, c + 2, c + 3);
+default:
+unreachable("Invalid component count");
+}
 }
 
 #endif
diff --git a/src/panfrost/midgard/mir.c b/src/panfrost/midgard/mir.c
index 8874937aa5d..9e5ba7abcb0 100644
--- a/src/panfrost/midgard/mir.c
+++ b/src/panfrost/midgard/mir.c
@@ -64,7 +64,24 @@ mir_get_swizzle(midgard_instruction *ins, unsigned idx)
 uint8_t raw =
 (idx == 2) ? ins->load_store.arg_2 : 
ins->load_store.arg_1;
 
-return 
component_to_swizzle(midgard_ldst_select(raw).component);
+/* TODO: Integrate component count with properties */
+unsigned components = 1;
+switch (ins->load_store.op) {
+case midgard_op_ld_int4:
+components = (idx == 0) ? 2 : 1;
+break;
+case midgard_op_st_int4:
+components = (idx == 1) ? 2 : 1;
+break;
+case midgard_op_ld_cubemap_coords:
+components = 3;
+break;
+default:
+components = 1;
+break;
+}
+
+return 
component_to_swizzle(midgard_ldst_select(raw).component, components);
 }
 default:
 unreachable("Unknown load/store source");
@@ -362,26 +379,6 @@ mir_source_count(midgard_instruction *ins)
 }
 }
 
-static unsigned
-mir_component_count_implicit(midgard_instruction *ins, unsigned i)
-{
-if (ins->type == TAG_LOAD_STORE_4) {
-switch (ins->load_store.op) {
-/* Address implicitly 64-bit */
-case midgard_op_ld_int4:
-return (i == 0) ? 1 : 0;
-
-case midgard_op_st_int4:
-return (i == 1) ? 1 : 0;
-
-default:
-return 0;
-}
-}
-
-return 0;
-}
-
 unsigned
 mir_mask_of_read_components(midgard_instruction *ins, unsigned node)
 {
@@ -394,21 +391,13 @@ mir_mask_of_read_components(midgard_instruction *ins, 
unsigned node)
 if (ins->compact_branch && ins->writeout && (i == 0))
 return 0xF;
 
-unsigned swizzle = mir_get_swizzle(ins, i);
-unsigned m = mir_mask_of_read_components_single(swizzle, 
ins->mask);
-
-/* Sometimes multi-arg ops are passed implicitly */
-unsigned implicit = mir_component_count_implicit(ins, i);
-assert(implicit < 2);
-
-/* Extend the mask */
-if (implicit == 1) {
-/* Ensure it's a single bit currently */
-assert((m >> __builtin_ctz(m)) == 0x1);
+/* ALU ops act componentwise so we need to pay attention to
+ * their mask. Texture/ldst does not so we don't clamp source
+ * readmasks based on the writemask */
+unsigned qmask = (ins->type == TAG_ALU_4) ? ins->mask : 0xF;
 
-/* Set the next bit to extend one*/
-m |= (m << 1);
-}
+unsigned swizzle = mir_get_swizzle(ins, i);
+unsigned m = mir_mask_of_read_components_single(swizzle, 
qmask);
 
 /* Handle dot products and things */
 if (ins->type == TAG_ALU_4 && !ins->compact_branch) {
-- 
2.23.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 19/30] pan/midgard: Add distance metric to choose_instruction

We require chosen instructions to be "close", to avoid ballooning
register pressure. This is a kludge that will go away once we have
proper liveness tracking in the scheduler, but for now it prevents a lot
of needless spilling.

Signed-off-by: Alyssa Rosenzweig 
---
 src/panfrost/midgard/midgard_schedule.c | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/src/panfrost/midgard/midgard_schedule.c 
b/src/panfrost/midgard/midgard_schedule.c
index dea8b023e9d..6177d0072f7 100644
--- a/src/panfrost/midgard/midgard_schedule.c
+++ b/src/panfrost/midgard/midgard_schedule.c
@@ -898,7 +898,21 @@ mir_choose_instruction(
 
 signed best_index = -1;
 
+/* Enforce a simple metric limiting distance to keep down register
+ * pressure. TOOD: replace with liveness tracking for much better
+ * results */
+
+unsigned max_active = 0;
+unsigned max_distance = 8;
+
 BITSET_FOREACH_SET(i, tmp, worklist, count) {
+max_active = MAX2(max_active, i);
+}
+
+BITSET_FOREACH_SET(i, tmp, worklist, count) {
+if ((max_active - i) >= max_distance)
+continue;
+
 if (tag != ~0 && instructions[i]->type != tag)
 continue;
 
-- 
2.23.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 15/30] pan/midgard: Add helpers for scheduling conditionals

Conditional instructions (csel and conditional branches) require their
condition to be written to a special condition pipeline register (r31.w
for scalar, r31.xyzw for vector). However, pipeline registers are live
only for the duration of a single bundle. As such, the logic to schedule
conditionals correct is surprisingly complex. Essentially, we see if we
could stuff the conditional within the same bundle as the csel/branch
without breaking anything; if we can, we do that. If we can't, we add a
dummy move to make room.

Signed-off-by: Alyssa Rosenzweig 
---
 src/panfrost/midgard/midgard_schedule.c | 146 
 1 file changed, 146 insertions(+)

diff --git a/src/panfrost/midgard/midgard_schedule.c 
b/src/panfrost/midgard/midgard_schedule.c
index eae73868e2a..92756e15189 100644
--- a/src/panfrost/midgard/midgard_schedule.c
+++ b/src/panfrost/midgard/midgard_schedule.c
@@ -957,6 +957,152 @@ mir_choose_bundle(
 return ~0;
 }
 
+/* When we are scheduling a branch/csel, we need the consumed condition in the
+ * same block as a pipeline register. There are two options to enable this:
+ *
+ *  - Move the conditional into the bundle. Preferred, but only works if the
+ *conditional is used only once and is from this block.
+ *  - Copy the conditional.
+ *
+ * We search for the conditional. If it's in this block, single-use, and
+ * without embedded constants, we schedule it immediately. Otherwise, we
+ * schedule a move for it.
+ *
+ * mir_comparison_mobile is a helper to find the moveable condition.
+ */
+
+static unsigned
+mir_comparison_mobile(
+compiler_context *ctx,
+midgard_instruction **instructions,
+unsigned count,
+unsigned cond)
+{
+if (!mir_single_use(ctx, cond))
+return ~0;
+
+unsigned ret = ~0;
+
+for (unsigned i = 0; i < count; ++i) {
+if (instructions[i]->dest != cond)
+continue;
+
+/* Must fit in an ALU bundle */
+if (instructions[i]->type != TAG_ALU_4)
+return ~0;
+
+/* We'll need to rewrite to .w but that doesn't work for vector
+ * ops that don't replicate (ball/bany), so bail there */
+
+if 
(GET_CHANNEL_COUNT(alu_opcode_props[instructions[i]->alu.op].props))
+return ~0;
+
+/* TODO: moving conditionals with constants */
+
+if (instructions[i]->has_constants)
+return ~0;
+
+/* Ensure it is written only once */
+
+if (ret != ~0)
+return ~0;
+else
+ret = i;
+}
+
+return ret;
+}
+
+/* Using the information about the moveable conditional itself, we either pop
+ * that condition off the worklist for use now, or create a move to
+ * artificially schedule instead as a fallback */
+
+static midgard_instruction *
+mir_schedule_comparison(
+compiler_context *ctx,
+midgard_instruction **instructions,
+BITSET_WORD *worklist, unsigned count,
+unsigned cond, bool vector, unsigned swizzle,
+midgard_instruction *user)
+{
+/* TODO: swizzle when scheduling */
+unsigned comp_i =
+(!vector && (swizzle == 0)) ?
+mir_comparison_mobile(ctx, instructions, count, cond) : ~0;
+
+/* If we can, schedule the condition immediately */
+if ((comp_i != ~0) && BITSET_TEST(worklist, comp_i)) {
+assert(comp_i < count);
+BITSET_CLEAR(worklist, comp_i);
+return instructions[comp_i];
+}
+
+/* Otherwise, we insert a move */
+midgard_vector_alu_src csel = {
+.swizzle = swizzle
+};
+
+midgard_instruction mov = v_mov(cond, csel, cond);
+mov.mask = vector ? 0xF : 0x1;
+
+return mir_insert_instruction_before(ctx, user, mov);
+}
+
+/* Most generally, we need instructions writing to r31 in the appropriate
+ * components */
+
+static midgard_instruction *
+mir_schedule_condition(compiler_context *ctx,
+struct midgard_predicate *predicate,
+BITSET_WORD *worklist, unsigned count,
+midgard_instruction **instructions,
+midgard_instruction *last)
+{
+/* For a branch, the condition is the only argument; for csel, third */
+bool branch = last->compact_branch;
+unsigned condition_index = branch ? 0 : 2;
+
+/* csel_v is vector; otherwise, conditions are scalar */
+bool vector = !branch && OP_IS_CSEL_V(last->alu.op);
+
+/* Grab the conditional instruction */
+
+midgard_instruction *cond = mir_schedule_comparison(
+ctx, instructi

[Mesa-dev] [PATCH 13/30] pan/midgard: Add predicate->exclude

A bit of a kludge but allows setting an implicit dependency of synthetic
conditional moves on the actual condition, fixing code generated like:

   vmul.feq r0, ..
   sadd.imov r31, .., r0
   vadd.fcsel [...]

The imov runs simultaneous with feq so it gets garbage results, but it's
too late to add an actual dependency practically speaking, since the new
synthetic imov doesn't have a node associated.

Signed-off-by: Alyssa Rosenzweig 
---
 src/panfrost/midgard/midgard_schedule.c | 18 ++
 1 file changed, 14 insertions(+), 4 deletions(-)

diff --git a/src/panfrost/midgard/midgard_schedule.c 
b/src/panfrost/midgard/midgard_schedule.c
index 86a77149c78..4c72844679c 100644
--- a/src/panfrost/midgard/midgard_schedule.c
+++ b/src/panfrost/midgard/midgard_schedule.c
@@ -827,6 +827,9 @@ struct midgard_predicate {
 uint8_t *constants;
 unsigned constant_count;
 bool blend_constant;
+
+/* Exclude this destination (if not ~0) */
+unsigned exclude;
 };
 
 /* For an instruction that can fit, adjust it to fit and update the constants
@@ -893,6 +896,9 @@ mir_choose_instruction(
 if (tag != ~0 && instructions[i]->type != tag)
 continue;
 
+if (predicate->exclude != ~0 && instructions[i]->dest == 
predicate->exclude)
+continue;
+
 /* Simulate in-order scheduling */
 if ((signed) i < best_index)
 continue;
@@ -930,7 +936,8 @@ mir_choose_bundle(
 
 struct midgard_predicate predicate = {
 .tag = ~0,
-.destructive = false
+.destructive = false,
+.exclude = ~0
 };
 
 midgard_instruction *chosen = mir_choose_instruction(instructions, 
worklist, count, );
@@ -950,7 +957,8 @@ mir_schedule_texture(
 {
 struct midgard_predicate predicate = {
 .tag = TAG_TEXTURE_4,
-.destructive = true
+.destructive = true,
+.exclude = ~0
 };
 
 midgard_instruction *ins =
@@ -974,7 +982,8 @@ mir_schedule_ldst(
 {
 struct midgard_predicate predicate = {
 .tag = TAG_LOAD_STORE_4,
-.destructive = true
+.destructive = true,
+.exclude = ~0
 };
 
 midgard_instruction *ins =
@@ -1003,7 +1012,8 @@ mir_schedule_alu(
 
 struct midgard_predicate predicate = {
 .tag = TAG_ALU_4,
-.destructive = true
+.destructive = true,
+.exclude = ~0
 };
 
 midgard_instruction *ins =
-- 
2.23.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 14/30] pan/midgard: Implement predicate->unit

This allows ALUs to select for each unit of the bundle separately.

Signed-off-by: Alyssa Rosenzweig 
---
 src/panfrost/midgard/midgard_schedule.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/src/panfrost/midgard/midgard_schedule.c 
b/src/panfrost/midgard/midgard_schedule.c
index 4c72844679c..eae73868e2a 100644
--- a/src/panfrost/midgard/midgard_schedule.c
+++ b/src/panfrost/midgard/midgard_schedule.c
@@ -885,6 +885,9 @@ mir_choose_instruction(
 {
 /* Parse the predicate */
 unsigned tag = predicate->tag;
+bool alu = tag == TAG_ALU_4;
+unsigned unit = predicate->unit;
+bool branch = alu && (unit == ALU_ENAB_BR_COMPACT);
 
 /* Iterate to find the best instruction satisfying the predicate */
 unsigned i;
@@ -899,6 +902,12 @@ mir_choose_instruction(
 if (predicate->exclude != ~0 && instructions[i]->dest == 
predicate->exclude)
 continue;
 
+if (alu && !branch && 
!(alu_opcode_props[instructions[i]->alu.op].props & unit))
+continue;
+
+if (branch && !instructions[i]->compact_branch)
+continue;
+
 /* Simulate in-order scheduling */
 if ((signed) i < best_index)
 continue;
-- 
2.23.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 03/30] pan/midgard: Squeeze indices before scheduling

This allows node_count to be correct while scheduling.

Signed-off-by: Alyssa Rosenzweig 
---
 src/panfrost/midgard/midgard_schedule.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/panfrost/midgard/midgard_schedule.c 
b/src/panfrost/midgard/midgard_schedule.c
index b8d9b5ec9be..2ed25da2e0d 100644
--- a/src/panfrost/midgard/midgard_schedule.c
+++ b/src/panfrost/midgard/midgard_schedule.c
@@ -983,6 +983,7 @@ schedule_program(compiler_context *ctx)
 /* Must be lowered right before RA */
 mir_squeeze_index(ctx);
 mir_lower_special_reads(ctx);
+mir_squeeze_index(ctx);
 
 /* Lowering can introduce some dead moves */
 
-- 
2.23.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 08/30] pan/midgard: Add mir_update_worklist helper

After we've chosen an instruction, popped it off, and processed it, it's
time to update the worklist, removing that instruction from the
dependency graph to allow its dependents to be put onto the worklist.

Signed-off-by: Alyssa Rosenzweig 
---
 src/panfrost/midgard/midgard_schedule.c | 39 +
 1 file changed, 39 insertions(+)

diff --git a/src/panfrost/midgard/midgard_schedule.c 
b/src/panfrost/midgard/midgard_schedule.c
index 046480ba2ac..8f324805cdb 100644
--- a/src/panfrost/midgard/midgard_schedule.c
+++ b/src/panfrost/midgard/midgard_schedule.c
@@ -767,6 +767,45 @@ mir_initialize_worklist(BITSET_WORD *worklist, 
midgard_instruction **instruction
 }
 }
 
+/* Update the worklist after an instruction terminates. Remove its edges from
+ * the graph and if that causes any node to have no dependencies, add it to the
+ * worklist */
+
+static void
+mir_update_worklist(
+BITSET_WORD *worklist, unsigned count,
+midgard_instruction **instructions, midgard_instruction *done)
+{
+/* Sanity check: if no instruction terminated, there is nothing to do.
+ * If the instruction that terminated had dependencies, that makes no
+ * sense and means we messed up the worklist. Finally, as the purpose
+ * of this routine is to update dependents, we abort early if there are
+ * no dependents defined. */
+
+if (!done)
+return;
+
+assert(done->nr_dependencies == 0);
+
+if (!done->dependents)
+return;
+
+/* We have an instruction with dependents. Iterate each dependent to
+ * remove one dependency (`done`), adding dependents to the worklist
+ * where possible. */
+
+unsigned i;
+BITSET_WORD tmp;
+BITSET_FOREACH_SET(i, tmp, done->dependents, count) {
+assert(instructions[i]->nr_dependencies);
+
+if (!(--instructions[i]->nr_dependencies))
+BITSET_SET(worklist, i);
+}
+
+free(done->dependents);
+}
+
 /* While scheduling, we need to choose instructions satisfying certain
  * criteria. As we schedule backwards, we choose the *last* instruction in the
  * worklist to simulate in-order scheduling. Chosen instructions must satisfy a
-- 
2.23.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 11/30] pan/midgard: Remove csel constant unit force

Signed-off-by: Alyssa Rosenzweig 
---
 src/panfrost/midgard/midgard_compile.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/src/panfrost/midgard/midgard_compile.c 
b/src/panfrost/midgard/midgard_compile.c
index 585591d9356..95ec48e9563 100644
--- a/src/panfrost/midgard/midgard_compile.c
+++ b/src/panfrost/midgard/midgard_compile.c
@@ -1991,9 +1991,6 @@ inline_alu_constants(compiler_context *ctx, midgard_block 
*block)
 midgard_instruction ins = 
v_mov(SSA_FIXED_REGISTER(REGISTER_CONSTANT), blank_alu_src, scratch);
 attach_constants(ctx, , entry, alu->src[1] 
+ 1);
 
-/* Force a break XXX Defer r31 writes */
-ins.unit = UNIT_VLUT;
-
 /* Set the source */
 alu->src[1] = scratch;
 
-- 
2.23.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 12/30] pan/midgard: Add constant intersection filters

In the future, we will want to keep track of which components of
constants of various sizes correspond to which parts of the bundle
constants, like in the old scheduler. For now, let's just stub it out
for a simple rule of one instruction with embedded constants per bundle.
We can eventually do better, of course.

Signed-off-by: Alyssa Rosenzweig 
---
 src/panfrost/midgard/midgard_schedule.c | 55 +
 1 file changed, 55 insertions(+)

diff --git a/src/panfrost/midgard/midgard_schedule.c 
b/src/panfrost/midgard/midgard_schedule.c
index 72c6ec42980..86a77149c78 100644
--- a/src/panfrost/midgard/midgard_schedule.c
+++ b/src/panfrost/midgard/midgard_schedule.c
@@ -817,8 +817,63 @@ struct midgard_predicate {
 
 /* True if we want to pop off the chosen instruction */
 bool destructive;
+
+/* State for bundle constants. constants is the actual constants
+ * for the bundle. constant_count is the number of bytes (up to
+ * 16) currently in use for constants. When picking in destructive
+ * mode, the constants array will be updated, and the instruction
+ * will be adjusted to index into the constants array */
+
+uint8_t *constants;
+unsigned constant_count;
+bool blend_constant;
 };
 
+/* For an instruction that can fit, adjust it to fit and update the constants
+ * array, in destructive mode. Returns whether the fitting was successful. */
+
+static bool
+mir_adjust_constants(midgard_instruction *ins,
+struct midgard_predicate *pred,
+bool destructive)
+{
+/* Blend constants dominate */
+if (ins->has_blend_constant) {
+if (pred->constant_count)
+return false;
+else if (destructive) {
+pred->blend_constant = true;
+pred->constant_count = 16;
+return true;
+}
+}
+
+/* No constant, nothing to adjust */
+if (!ins->has_constants)
+return true;
+
+/* TODO: Deduplicate; permit multiple constants within a bundle */
+
+if (destructive && !pred->constant_count) {
+if (ins->alu.reg_mode == midgard_reg_mode_16) {
+  /* TODO: Fix packing XXX */
+uint16_t *bundles = (uint16_t *) pred->constants;
+uint32_t *constants = (uint32_t *) ins->constants;
+
+/* Copy them wholesale */
+for (unsigned i = 0; i < 4; ++i)
+bundles[i] = constants[i];
+} else {
+memcpy(pred->constants, ins->constants, 16);
+}
+
+pred->constant_count = 16;
+return true;
+}
+
+return !pred->constant_count;
+}
+
 static midgard_instruction *
 mir_choose_instruction(
 midgard_instruction **instructions,
-- 
2.23.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 07/30] pan/midgard: Add mir_choose_instruction stub

In the future, this routine will implement the core scheduling logic to
decide which instruction out of the worklist will be scheduled next, in
a way that minimizes cycle count and register pressure.

In the present, we are more interested in replicating in-order
scheduling with the much-more-powerful out-of-order model. So rather
than discriminating by a register pressure estimate, we simply choose
the latest possible instruction in the worklist.

Signed-off-by: Alyssa Rosenzweig 
---
 src/panfrost/midgard/midgard_schedule.c | 55 +
 1 file changed, 55 insertions(+)

diff --git a/src/panfrost/midgard/midgard_schedule.c 
b/src/panfrost/midgard/midgard_schedule.c
index aff93399a80..046480ba2ac 100644
--- a/src/panfrost/midgard/midgard_schedule.c
+++ b/src/panfrost/midgard/midgard_schedule.c
@@ -767,6 +767,61 @@ mir_initialize_worklist(BITSET_WORD *worklist, 
midgard_instruction **instruction
 }
 }
 
+/* While scheduling, we need to choose instructions satisfying certain
+ * criteria. As we schedule backwards, we choose the *last* instruction in the
+ * worklist to simulate in-order scheduling. Chosen instructions must satisfy a
+ * given predicate. */
+
+struct midgard_predicate {
+/* TAG or ~0 for dont-care */
+unsigned tag;
+
+/* True if we want to pop off the chosen instruction */
+bool destructive;
+};
+
+static midgard_instruction *
+mir_choose_instruction(
+midgard_instruction **instructions,
+BITSET_WORD *worklist, unsigned count,
+struct midgard_predicate *predicate)
+{
+/* Parse the predicate */
+unsigned tag = predicate->tag;
+
+/* Iterate to find the best instruction satisfying the predicate */
+unsigned i;
+BITSET_WORD tmp;
+
+signed best_index = -1;
+
+BITSET_FOREACH_SET(i, tmp, worklist, count) {
+if (tag != ~0 && instructions[i]->type != tag)
+continue;
+
+/* Simulate in-order scheduling */
+if ((signed) i < best_index)
+continue;
+
+best_index = i;
+}
+
+
+/* Did we find anything?  */
+
+if (best_index < 0)
+return NULL;
+
+/* If we found something, remove it from the worklist */
+assert(best_index < count);
+
+if (predicate->destructive) {
+BITSET_CLEAR(worklist, best_index);
+}
+
+return instructions[best_index];
+}
+
 /* Schedule a single block by iterating its instruction to create bundles.
  * While we go, tally about the bundle sizes to compute the block size. */
 
-- 
2.23.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 09/30] pan/midgard: Add mir_choose_bundle helper

It's not always obvious what the optimal bundle type should be. Let's
break out the logic to decide.

Currently set for purely in-order operation.

Signed-off-by: Alyssa Rosenzweig 
---
 src/panfrost/midgard/midgard_schedule.c | 25 +
 1 file changed, 25 insertions(+)

diff --git a/src/panfrost/midgard/midgard_schedule.c 
b/src/panfrost/midgard/midgard_schedule.c
index 8f324805cdb..969460235d1 100644
--- a/src/panfrost/midgard/midgard_schedule.c
+++ b/src/panfrost/midgard/midgard_schedule.c
@@ -861,6 +861,31 @@ mir_choose_instruction(
 return instructions[best_index];
 }
 
+/* Still, we don't choose instructions in a vacuum. We need a way to choose the
+ * best bundle type (ALU, load/store, texture). Nondestructive. */
+
+static unsigned
+mir_choose_bundle(
+midgard_instruction **instructions,
+BITSET_WORD *worklist, unsigned count)
+{
+/* At the moment, our algorithm is very simple - use the bundle of the
+ * best instruction, regardless of what else could be scheduled
+ * alongside it. This is not optimal but it works okay for in-order */
+
+struct midgard_predicate predicate = {
+.tag = ~0,
+.destructive = false
+};
+
+midgard_instruction *chosen = mir_choose_instruction(instructions, 
worklist, count, );
+
+if (chosen)
+return chosen->type;
+else
+return ~0;
+}
+
 /* Schedule a single block by iterating its instruction to create bundles.
  * While we go, tally about the bundle sizes to compute the block size. */
 
-- 
2.23.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 06/30] pan/midgard: Initialize worklist

This flows naturally from the dependency graph

Signed-off-by: Alyssa Rosenzweig 
---
 src/panfrost/midgard/midgard_schedule.c | 17 +
 1 file changed, 17 insertions(+)

diff --git a/src/panfrost/midgard/midgard_schedule.c 
b/src/panfrost/midgard/midgard_schedule.c
index 70fa030390c..aff93399a80 100644
--- a/src/panfrost/midgard/midgard_schedule.c
+++ b/src/panfrost/midgard/midgard_schedule.c
@@ -755,6 +755,18 @@ flatten_mir(midgard_block *block, unsigned *len)
 return instructions;
 }
 
+/* The worklist is the set of instructions that can be scheduled now; that is,
+ * the set of instructions with no remaining dependencies */
+
+static void
+mir_initialize_worklist(BITSET_WORD *worklist, midgard_instruction 
**instructions, unsigned count)
+{
+for (unsigned i = 0; i < count; ++i) {
+if (instructions[i]->nr_dependencies == 0)
+BITSET_SET(worklist, i);
+}
+}
+
 /* Schedule a single block by iterating its instruction to create bundles.
  * While we go, tally about the bundle sizes to compute the block size. */
 
@@ -769,6 +781,11 @@ schedule_block(compiler_context *ctx, midgard_block *block)
 unsigned node_count = ctx->temp_count + 1;
 mir_create_dependency_graph(instructions, len, node_count);
 
+/* Allocate the worklist */
+size_t sz = BITSET_WORDS(len) * sizeof(BITSET_WORD);
+BITSET_WORD *worklist = calloc(sz, 1);
+mir_initialize_worklist(worklist, instructions, len);
+
 util_dynarray_init(>bundles, NULL);
 
 block->quadword_count = 0;
-- 
2.23.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 01/30] pan/midgard: Add missing parans in SWIZZLE definition

TODO: Move me to front of series.

Signed-off-by: Alyssa Rosenzweig 
---
 src/panfrost/midgard/helpers.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/panfrost/midgard/helpers.h b/src/panfrost/midgard/helpers.h
index ac58fd50327..343fad0fea8 100644
--- a/src/panfrost/midgard/helpers.h
+++ b/src/panfrost/midgard/helpers.h
@@ -189,7 +189,7 @@ quadword_size(int tag)
 
 /* Swizzle support */
 
-#define SWIZZLE(A, B, C, D) ((D << 6) | (C << 4) | (B << 2) | (A << 0))
+#define SWIZZLE(A, B, C, D) (((D) << 6) | ((C) << 4) | ((B) << 2) | ((A) << 0))
 #define SWIZZLE_FROM_ARRAY(r) SWIZZLE(r[0], r[1], r[2], r[3])
 #define COMPONENT_X 0x0
 #define COMPONENT_Y 0x1
-- 
2.23.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 10/30] pan/midgard: Add mir_schedule_texture/ldst/alu helpers

We don't actually do any scheduling here yet, but add per-tag helpers to
consume an instruction, print it, pop it off the worklist.

Signed-off-by: Alyssa Rosenzweig 
---
 src/panfrost/midgard/midgard_schedule.c | 190 
 1 file changed, 190 insertions(+)

diff --git a/src/panfrost/midgard/midgard_schedule.c 
b/src/panfrost/midgard/midgard_schedule.c
index 969460235d1..72c6ec42980 100644
--- a/src/panfrost/midgard/midgard_schedule.c
+++ b/src/panfrost/midgard/midgard_schedule.c
@@ -886,9 +886,199 @@ mir_choose_bundle(
 return ~0;
 }
 
+/* Schedules a single bundle of the given type */
+
+static midgard_bundle
+mir_schedule_texture(
+midgard_instruction **instructions,
+BITSET_WORD *worklist, unsigned len)
+{
+struct midgard_predicate predicate = {
+.tag = TAG_TEXTURE_4,
+.destructive = true
+};
+
+midgard_instruction *ins =
+mir_choose_instruction(instructions, worklist, len, 
);
+
+mir_update_worklist(worklist, len, instructions, ins);
+
+struct midgard_bundle out = {
+.tag = TAG_TEXTURE_4,
+.instruction_count = 1,
+.instructions = { ins }
+};
+
+return out;
+}
+
+static midgard_bundle
+mir_schedule_ldst(
+midgard_instruction **instructions,
+BITSET_WORD *worklist, unsigned len)
+{
+struct midgard_predicate predicate = {
+.tag = TAG_LOAD_STORE_4,
+.destructive = true
+};
+
+midgard_instruction *ins =
+mir_choose_instruction(instructions, worklist, len, 
);
+
+mir_update_worklist(worklist, len, instructions, ins);
+
+struct midgard_bundle out = {
+.tag = TAG_LOAD_STORE_4,
+.instruction_count = 1,
+.instructions = { ins }
+};
+
+return out;
+}
+
+static midgard_bundle
+mir_schedule_alu(
+compiler_context *ctx,
+midgard_instruction **instructions,
+BITSET_WORD *worklist, unsigned len)
+{
+struct midgard_bundle bundle = {};
+
+unsigned bytes_emitted = sizeof(bundle.control);
+
+struct midgard_predicate predicate = {
+.tag = TAG_ALU_4,
+.destructive = true
+};
+
+midgard_instruction *ins =
+mir_choose_instruction(instructions, worklist, len, 
);
+
+midgard_instruction *vmul = NULL;
+midgard_instruction *vadd = NULL;
+midgard_instruction *vlut = NULL;
+midgard_instruction *smul = NULL;
+midgard_instruction *sadd = NULL;
+midgard_instruction *branch = NULL;
+
+mir_update_worklist(worklist, len, instructions, ins);
+
+if (ins->compact_branch) {
+branch = ins;
+} else if (!ins->unit) {
+unsigned units = alu_opcode_props[ins->alu.op].props;
+
+if (units & UNIT_VMUL) {
+ins->unit = UNIT_VMUL;
+vmul = ins;
+} else if (units & UNIT_VADD) {
+ins->unit = UNIT_VADD;
+vadd = ins;
+} else if (units & UNIT_VLUT) {
+ins->unit = UNIT_VLUT;
+vlut = ins;
+} else
+assert(0);
+}
+
+bundle.has_embedded_constants = ins->has_constants;
+bundle.has_blend_constant = ins->has_blend_constant;
+
+if (ins->alu.reg_mode == midgard_reg_mode_16) {
+  /* TODO: Fix packing XXX */
+uint16_t *bundles = (uint16_t *) bundle.constants;
+uint32_t *constants = (uint32_t *) ins->constants;
+
+/* Copy them wholesale */
+for (unsigned i = 0; i < 4; ++i)
+bundles[i] = constants[i];
+} else {
+memcpy(bundle.constants, ins->constants, 
sizeof(bundle.constants));
+}
+
+if (ins->writeout) {
+unsigned src = (branch->src[0] == ~0) ? SSA_FIXED_REGISTER(0) 
: branch->src[0];
+unsigned temp = (branch->src[0] == ~0) ? SSA_FIXED_REGISTER(0) 
: make_compiler_temp(ctx);
+midgard_instruction mov = v_mov(src, blank_alu_src, temp);
+vmul = mem_dup(, sizeof(midgard_instruction));
+vmul->unit = UNIT_VMUL;
+vmul->mask = 0xF;
+/* TODO: Don't leak */
+
+/* Rewrite to use our temp */
+midgard_instruction *stages[] = { sadd, vadd, smul };
+
+for (unsigned i = 0; i < ARRAY_SIZE(stages); ++i) {
+if (stages[i])
+mir_rewrite_index_dst_single(stages[i], src, 
temp);
+

Re: [Mesa-dev] [PATCH 1/3] panfrost: Make sure a clear does not re-use a pre-existing batch

2019-09-23 Thread Alyssa Rosenzweig

> One more thing: optimization of the above scenario is probably
> something we'll want to have at some point, but I think the current
> patch is worth applying in the meantime. All this patch does is
> enforcing ordering of clears/draws to make sure the end result matches
> users expectation.

Hum alright, R-b in that case, it's worth it to fix so many tests. But
please add a "TODO" comment explaining what we just emailed about, so it
doesn't get lost to the voids of future mailing lists :)


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v3 01/17] panfrost: Extend the panfrost_batch_add_bo() API to pass access flags

2019-09-22 Thread Alyssa Rosenzweig

> +your collabora address

Thank you

> > > > I think this batch_add_bo should probably dropped altogether? This loop
> > > > is dealing with constant buffers; the shaders themselves were added  
> > > 
> > > I'll double check. I couldn't find where BOs containing shader programs
> > > were added last time I looked.  
> > 
> > Masking a real bug :o
> > 
> > It should probably happen in panfrost_patch_shader_state?
> 
> Ok, I'll add it there, I wasn't sure this function was called for all
> shaders, but looking at the code a second time it seems to be the case.

I think so as well, yeah.

> > As I stated before, I thought we should be adding the BO for
> > wallpapering when we actually wallpaper, since that's a slow path. Not
> > wallpapering is the default and ideally what most apps should do.
> 
> Wallpapering happens too late (when we are flushing the batch) to have
> an impact on the dep graph, but we can probably know that wallpapering
> will be needed before that. My question remains though, are
> vertex/tiler supposed to touch the texture BO they're reading from, or
> should we only flag the BO for FRAGMENT use.

Vertex/tiler should not touch the texture BO, unless you're texturing
from the vertex shader (which we're not for wallpapering).


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 1/3] panfrost: Make sure a clear does not re-use a pre-existing batch

2019-09-22 Thread Alyssa Rosenzweig

> > To be clear, if we have a batch and do the following operations:
> > 
> > clear red
> > draw 1
> > clear green
> > draw 2
> > flush
> > 
> > All we should see is #2 on a green background, which this patch handles
> > by the second clear invalidating all the clears/draws that came before
> > it (provided there is no flush in between). 
> > 
> > I might just be tripped up by the "freeze" name. That really means throw
> > away / free here, I guess?
> 
> Nope. Freeze means "stop queuing new draws to this batch". I guess we
> could free the batch as well if the result of the previous draws/clear
> are really overwritten by this new clear, but that implies checking the
> new clear flags to make sure they target the same depth/stencil/color
> combination. On the other hand, I'm wondering if it's really the
> driver's job to try to optimize silly things the apps might do. I mean,
> the sequence you describe does not look like a wise thing to do since
> the "clear red+draw 1" end up being overwritten by "clear green + draw
> 2".

I'm quite confused how this patch works, then.

A few thoughts: if the app clears all buffers in the middle, then yes
it's silly and yes we may as well optimize it out. (Should that be a thing GL
drivers have to do? I mean, if the other drivers are too...)

If the sequence is more like:

clear all buffers
draw 1
clear color buffer (preserve depth stencil)
draw 2
flush

That second clear should really be done by drawing a full screen quad,
just like if we were wallpapering, except loading its colour from a
uniform instead of a texture.

Similarly, a depth-only clear mid-frame can be emulated by drawing a
full-screen quad with the gl_Position.zw components juryrigged to the
desired depth components, and disabling colour draws by setting the
colour mask to 0x0. That also means you can skip having any shader at
all (literally set the shader pointer to 0x0) so that's faster.

Finally, a stencil-only clear can occur similarly playing tricks with
the stencil test parameters.

I suspect u_blitter or mesa/st is capable of doing these sorts of tricks
on our behalf, but I have not researched it extensively.

In any case, for a partial clear mid-frame, we would much rather do:

clear all buffers
draw 1
draw fullscreen quad (disable Z/S writes)
draw 2
flush

than what it sounds like this patch does

clear all buffers
draw 1
flush

clear all buffers
wallpaper
draw 2
flush

Please do correct me if I've misunderstood the logic here.

> 
> > 
> > Provided that's the idea (and we're not somehow saving the original draw
> > 1), it's Reviewed-by A R 
> > 
> > On Fri, Sep 20, 2019 at 04:53:37PM +0200, Boris Brezillon wrote:
> > > glClear()s are expected to be the first thing GL apps do before drawing
> > > new things. If there's already an existing batch targetting the same
> > > FBO that has draws attached to it, we should make sure the new clear
> > > gets a new batch assigned to guaranteed that the FB content is actually
> > > cleared with the requested color/depth/stencil values.
> > > 
> > > We create a panfrost_get_fresh_batch_for_fbo() helper for that and
> > > call it from panfrost_clear().
> > > 
> > > Signed-off-by: Boris Brezillon 
> > > ---
> > >  src/gallium/drivers/panfrost/pan_context.c |  2 +-
> > >  src/gallium/drivers/panfrost/pan_job.c | 21 +
> > >  src/gallium/drivers/panfrost/pan_job.h |  3 +++
> > >  3 files changed, 25 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/src/gallium/drivers/panfrost/pan_context.c 
> > > b/src/gallium/drivers/panfrost/pan_context.c
> > > index ac01461a07fe..b2f2a9da7a51 100644
> > > --- a/src/gallium/drivers/panfrost/pan_context.c
> > > +++ b/src/gallium/drivers/panfrost/pan_context.c
> > > @@ -162,7 +162,7 @@ panfrost_clear(
> > >  double depth, unsigned stencil)
> > >  {
> > >  struct panfrost_context *ctx = pan_context(pipe);
> > > -struct panfrost_batch *batch = panfrost_get_batch_for_fbo(ctx);
> > > +struct panfrost_batch *batch = 
> > > panfrost_get_fresh_batch_for_fbo(ctx);
> > >  
> > >  panfrost_batch_add_fbo_bos(batch);
> > >  panfrost_batch_clear(batch, buffers, color, depth, stencil);
> > > diff --git a/src/gallium/drivers/panfrost/pan_job.c 
> > > b/src/gallium/drivers/panfrost/pan_job.c
> > > index d8330bc133a6..4ec2aa0565d7 100644
> > > --- a/src/gallium/drivers/panfrost/pan_job.c
> > > +++ b/src/gallium/drivers/panfrost/pan_job.c
> > > @@ -298,6 +298,27 @@ panfrost_get_batch_for_fbo(struct panfrost_context 
> > > *ctx)
> > >  return batch;
> > >  }
> > >  
> > > +struct panfrost_batch *
> > > +panfrost_get_fresh_batch_for_fbo(struct panfrost_context *ctx)
> > > +{
> > > +struct panfrost_batch *batch;
> > > +
> > > +batch = panfrost_get_batch(ctx, >pipe_framebuffer);
> > > +
> > > +

Re: [Mesa-dev] [PATCH v3 01/17] panfrost: Extend the panfrost_batch_add_bo() API to pass access flags

2019-09-22 Thread Alyssa Rosenzweig

> > Although actually I am not at all sure what this batch_add_bo is doing
> > at all?
> > 
> > I think this batch_add_bo should probably dropped altogether? This loop
> > is dealing with constant buffers; the shaders themselves were added
> 
> I'll double check. I couldn't find where BOs containing shader programs
> were added last time I looked.

Masking a real bug :o

It should probably happen in panfrost_patch_shader_state?

> > >  void panfrost_batch_add_fbo_bos(struct panfrost_batch *batch)
> > >  {
> > > +uint32_t flags = PAN_BO_ACCESS_SHARED | PAN_BO_ACCESS_WRITE |
> > > + PAN_BO_ACCESS_VERTEX_TILER |
> > > + PAN_BO_ACCESS_FRAGMENT;  
> > 
> > I think we can drop VERTEX_TILER here...? The buffers are written right
> > at the end of the FRAGMENT job, not touched before that.
> 
> What about the read done when drawing the wallpaper? I guess it's also
> only read by the fragment job, but I wasn't sure.

As I stated before, I thought we should be adding the BO for
wallpapering when we actually wallpaper, since that's a slow path. Not
wallpapering is the default and ideally what most apps should do.

> > If nothing else is broken, this should allow a nice perf boost with
> > pipelining, so the vertex/tiler from frame n+1 can run in parallel with
> > the fragment of frame n (rather than blocking on frame n finishing with
> > the FBOs).
> 
> Would require the kernel patches I posted earlier for that to
> happen ;-). Right now all jobs touching the same BO are serialized
> because of the implicit BO fences added by the kernel driver.

Sure~ Maybe this sort of bug was the reason you weren't seeing
improvement from those kernel patches?
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v3 00/17] panfrost: Support batch pipelining

Series looks quite good, overall. Just a few minor issues, but probably
not even enough to justify a respin. Congratulations! :p

On Wed, Sep 18, 2019 at 03:24:22PM +0200, Boris Brezillon wrote:
> Hello,
> 
> This is the third attempt at supporting batch pipelining. This time I
> implemented it using a dependency graph (as suggested by Alyssa and
> Steven) so that batch submission can be delayed even more: the only
> time we flush batches now is when we have an explicit flush or when
> the CPU needs to access a BO (we might want to tweak that a bit to
> avoid the extra latency incurred by this solution). With that in place
> we hope to increase GPU utilization.
> 
> Patches 15 and 16 are optional, but I remember reading (I think it was
> Steven who mentioned that) that draw order matters when queueing render
> operations for different frames (frame N should ideally be ready before
> frame N+1). Not sure if enforcing draw call order is enough to guarantee
> that rendering of frame N always finishes before frame N+1 though.
> If that's something you don't want to merge, I can drop it.
> 
> Regards,
> 
> Boris
> 
> Boris Brezillon (17):
>   panfrost: Extend the panfrost_batch_add_bo() API to pass access flags
>   panfrost: Make panfrost_batch->bos a hash table
>   panfrost: Add a batch fence
>   panfrost: Use the per-batch fences to wait on the last submitted batch
>   panfrost: Add a panfrost_freeze_batch() helper
>   panfrost: Start tracking inter-batch dependencies
>   panfrost: Prepare panfrost_fence for batch pipelining
>   panfrost: Add a panfrost_flush_all_batches() helper
>   panfrost: Add a panfrost_flush_batches_accessing_bo() helper
>   panfrost: Kill the explicit serialization in panfrost_batch_submit()
>   panfrost: Get rid of the flush in panfrost_set_framebuffer_state()
>   panfrost: Add flags to reflect the BO imported/exported state
>   panfrost: Make sure the BO is 'ready' when picked from the cache
>   panfrost: Do fine-grained flushing when preparing BO for CPU accesses
>   panfrost: Rename ctx->batches into ctx->fbo_to_batch
>   panfrost: Take draw call order into account
>   panfrost/ci: New tests are passing
> 
>  .../drivers/panfrost/ci/expected-failures.txt |   4 -
>  src/gallium/drivers/panfrost/pan_allocate.c   |  14 +-
>  src/gallium/drivers/panfrost/pan_blend_cso.c  |   6 +-
>  src/gallium/drivers/panfrost/pan_bo.c | 116 ++-
>  src/gallium/drivers/panfrost/pan_bo.h |  44 ++
>  src/gallium/drivers/panfrost/pan_compute.c|   2 +-
>  src/gallium/drivers/panfrost/pan_context.c| 121 ++--
>  src/gallium/drivers/panfrost/pan_context.h|  15 +-
>  src/gallium/drivers/panfrost/pan_instancing.c |   5 +-
>  src/gallium/drivers/panfrost/pan_job.c| 668 --
>  src/gallium/drivers/panfrost/pan_job.h|  58 +-
>  src/gallium/drivers/panfrost/pan_resource.c   |  27 +-
>  src/gallium/drivers/panfrost/pan_screen.c |  65 +-
>  src/gallium/drivers/panfrost/pan_screen.h |   3 +-
>  src/gallium/drivers/panfrost/pan_varyings.c   |  10 +-
>  15 files changed, 956 insertions(+), 202 deletions(-)
> 
> -- 
> 2.21.0
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v3 15/17] panfrost: Rename ctx->batches into ctx->fbo_to_batch

Erm, un r-b, sorry, didn't realize this was the optional. Let's hold off
on this patch and the succeeding one for now.

On Wed, Sep 18, 2019 at 03:24:37PM +0200, Boris Brezillon wrote:
> We are about to add a batch queue to keep track of submission order.
> Let's rename the existing batches hash table (which is used to get the
> batch attached to an FBO) into fbo_to_batch to avoid confusion.
> 
> Signed-off-by: Boris Brezillon 
> ---
>  src/gallium/drivers/panfrost/pan_context.c  |  2 +-
>  src/gallium/drivers/panfrost/pan_context.h  |  2 +-
>  src/gallium/drivers/panfrost/pan_job.c  | 21 +++--
>  src/gallium/drivers/panfrost/pan_resource.c | 16 
>  4 files changed, 21 insertions(+), 20 deletions(-)
> 
> diff --git a/src/gallium/drivers/panfrost/pan_context.c 
> b/src/gallium/drivers/panfrost/pan_context.c
> index 07bafad58a00..0330b5852676 100644
> --- a/src/gallium/drivers/panfrost/pan_context.c
> +++ b/src/gallium/drivers/panfrost/pan_context.c
> @@ -1355,7 +1355,7 @@ panfrost_flush(
>   */
>  if (fence) {
>  util_dynarray_init(, NULL);
> -hash_table_foreach(ctx->batches, hentry) {
> +hash_table_foreach(ctx->fbo_to_batch, hentry) {
>  struct panfrost_batch *batch = hentry->data;
>  
>  panfrost_batch_fence_reference(batch->out_sync);
> diff --git a/src/gallium/drivers/panfrost/pan_context.h 
> b/src/gallium/drivers/panfrost/pan_context.h
> index d50ed57d5d8a..f13967f51b46 100644
> --- a/src/gallium/drivers/panfrost/pan_context.h
> +++ b/src/gallium/drivers/panfrost/pan_context.h
> @@ -112,7 +112,7 @@ struct panfrost_context {
>  
>  /* Bound job batch and map of panfrost_batch_key to job batches */
>  struct panfrost_batch *batch;
> -struct hash_table *batches;
> +struct hash_table *fbo_to_batch;
>  
>  /* panfrost_bo -> panfrost_bo_access */
>  struct hash_table *accessed_bos;
> diff --git a/src/gallium/drivers/panfrost/pan_job.c 
> b/src/gallium/drivers/panfrost/pan_job.c
> index a56f4044fda0..45f9d9d24b41 100644
> --- a/src/gallium/drivers/panfrost/pan_job.c
> +++ b/src/gallium/drivers/panfrost/pan_job.c
> @@ -132,9 +132,9 @@ panfrost_freeze_batch(struct panfrost_batch *batch)
>   * matches. This way, next draws/clears targeting this FBO will 
> trigger
>   * the creation of a new batch.
>   */
> -entry = _mesa_hash_table_search(ctx->batches, >key);
> +entry = _mesa_hash_table_search(ctx->fbo_to_batch, >key);
>  if (entry && entry->data == batch)
> -_mesa_hash_table_remove(ctx->batches, entry);
> +_mesa_hash_table_remove(ctx->fbo_to_batch, entry);
>  
>  /* If this is the bound batch, the panfrost_context parameters are
>   * relevant so submitting it invalidates those parameters, but if 
> it's
> @@ -153,7 +153,7 @@ static bool panfrost_batch_is_frozen(struct 
> panfrost_batch *batch)
>  struct panfrost_context *ctx = batch->ctx;
>  struct hash_entry *entry;
>  
> -entry = _mesa_hash_table_search(ctx->batches, >key);
> +entry = _mesa_hash_table_search(ctx->fbo_to_batch, >key);
>  if (entry && entry->data == batch)
>  return false;
>  
> @@ -248,7 +248,8 @@ panfrost_get_batch(struct panfrost_context *ctx,
> const struct pipe_framebuffer_state *key)
>  {
>  /* Lookup the job first */
> -struct hash_entry *entry = _mesa_hash_table_search(ctx->batches, 
> key);
> +struct hash_entry *entry = _mesa_hash_table_search(ctx->fbo_to_batch,
> +   key);
>  
>  if (entry)
>  return entry->data;
> @@ -258,7 +259,7 @@ panfrost_get_batch(struct panfrost_context *ctx,
>  struct panfrost_batch *batch = panfrost_create_batch(ctx, key);
>  
>  /* Save the created job */
> -_mesa_hash_table_insert(ctx->batches, >key, batch);
> +_mesa_hash_table_insert(ctx->fbo_to_batch, >key, batch);
>  
>  return batch;
>  }
> @@ -915,7 +916,7 @@ panfrost_flush_all_batches(struct panfrost_context *ctx, 
> bool wait)
>  util_dynarray_init(, NULL);
>  }
>  
> -hash_table_foreach(ctx->batches, hentry) {
> +hash_table_foreach(ctx->fbo_to_batch, hentry) {
>  struct panfrost_batch *batch = hentry->data;
>  
>  assert(batch);
> @@ -931,7 +932,7 @@ panfrost_flush_all_batches(struct panfrost_context *ctx, 
> bool wait)
>  panfrost_batch_submit(batch);
>  }
>  
> -assert(!ctx->batches->entries);
> +assert(!ctx->fbo_to_batch->entries);
>  
>  /* Collect batch fences before returning */
>  panfrost_gc_fences(ctx);
> @@ -1183,9 +1184,9 @@ panfrost_batch_is_scanout(struct panfrost_batch *batch)
>  void
>

Re: [Mesa-dev] [PATCH v3 15/17] panfrost: Rename ctx->batches into ctx->fbo_to_batch

R-b
On Wed, Sep 18, 2019 at 03:24:37PM +0200, Boris Brezillon wrote:
> We are about to add a batch queue to keep track of submission order.
> Let's rename the existing batches hash table (which is used to get the
> batch attached to an FBO) into fbo_to_batch to avoid confusion.
> 
> Signed-off-by: Boris Brezillon 
> ---
>  src/gallium/drivers/panfrost/pan_context.c  |  2 +-
>  src/gallium/drivers/panfrost/pan_context.h  |  2 +-
>  src/gallium/drivers/panfrost/pan_job.c  | 21 +++--
>  src/gallium/drivers/panfrost/pan_resource.c | 16 
>  4 files changed, 21 insertions(+), 20 deletions(-)
> 
> diff --git a/src/gallium/drivers/panfrost/pan_context.c 
> b/src/gallium/drivers/panfrost/pan_context.c
> index 07bafad58a00..0330b5852676 100644
> --- a/src/gallium/drivers/panfrost/pan_context.c
> +++ b/src/gallium/drivers/panfrost/pan_context.c
> @@ -1355,7 +1355,7 @@ panfrost_flush(
>   */
>  if (fence) {
>  util_dynarray_init(, NULL);
> -hash_table_foreach(ctx->batches, hentry) {
> +hash_table_foreach(ctx->fbo_to_batch, hentry) {
>  struct panfrost_batch *batch = hentry->data;
>  
>  panfrost_batch_fence_reference(batch->out_sync);
> diff --git a/src/gallium/drivers/panfrost/pan_context.h 
> b/src/gallium/drivers/panfrost/pan_context.h
> index d50ed57d5d8a..f13967f51b46 100644
> --- a/src/gallium/drivers/panfrost/pan_context.h
> +++ b/src/gallium/drivers/panfrost/pan_context.h
> @@ -112,7 +112,7 @@ struct panfrost_context {
>  
>  /* Bound job batch and map of panfrost_batch_key to job batches */
>  struct panfrost_batch *batch;
> -struct hash_table *batches;
> +struct hash_table *fbo_to_batch;
>  
>  /* panfrost_bo -> panfrost_bo_access */
>  struct hash_table *accessed_bos;
> diff --git a/src/gallium/drivers/panfrost/pan_job.c 
> b/src/gallium/drivers/panfrost/pan_job.c
> index a56f4044fda0..45f9d9d24b41 100644
> --- a/src/gallium/drivers/panfrost/pan_job.c
> +++ b/src/gallium/drivers/panfrost/pan_job.c
> @@ -132,9 +132,9 @@ panfrost_freeze_batch(struct panfrost_batch *batch)
>   * matches. This way, next draws/clears targeting this FBO will 
> trigger
>   * the creation of a new batch.
>   */
> -entry = _mesa_hash_table_search(ctx->batches, >key);
> +entry = _mesa_hash_table_search(ctx->fbo_to_batch, >key);
>  if (entry && entry->data == batch)
> -_mesa_hash_table_remove(ctx->batches, entry);
> +_mesa_hash_table_remove(ctx->fbo_to_batch, entry);
>  
>  /* If this is the bound batch, the panfrost_context parameters are
>   * relevant so submitting it invalidates those parameters, but if 
> it's
> @@ -153,7 +153,7 @@ static bool panfrost_batch_is_frozen(struct 
> panfrost_batch *batch)
>  struct panfrost_context *ctx = batch->ctx;
>  struct hash_entry *entry;
>  
> -entry = _mesa_hash_table_search(ctx->batches, >key);
> +entry = _mesa_hash_table_search(ctx->fbo_to_batch, >key);
>  if (entry && entry->data == batch)
>  return false;
>  
> @@ -248,7 +248,8 @@ panfrost_get_batch(struct panfrost_context *ctx,
> const struct pipe_framebuffer_state *key)
>  {
>  /* Lookup the job first */
> -struct hash_entry *entry = _mesa_hash_table_search(ctx->batches, 
> key);
> +struct hash_entry *entry = _mesa_hash_table_search(ctx->fbo_to_batch,
> +   key);
>  
>  if (entry)
>  return entry->data;
> @@ -258,7 +259,7 @@ panfrost_get_batch(struct panfrost_context *ctx,
>  struct panfrost_batch *batch = panfrost_create_batch(ctx, key);
>  
>  /* Save the created job */
> -_mesa_hash_table_insert(ctx->batches, >key, batch);
> +_mesa_hash_table_insert(ctx->fbo_to_batch, >key, batch);
>  
>  return batch;
>  }
> @@ -915,7 +916,7 @@ panfrost_flush_all_batches(struct panfrost_context *ctx, 
> bool wait)
>  util_dynarray_init(, NULL);
>  }
>  
> -hash_table_foreach(ctx->batches, hentry) {
> +hash_table_foreach(ctx->fbo_to_batch, hentry) {
>  struct panfrost_batch *batch = hentry->data;
>  
>  assert(batch);
> @@ -931,7 +932,7 @@ panfrost_flush_all_batches(struct panfrost_context *ctx, 
> bool wait)
>  panfrost_batch_submit(batch);
>  }
>  
> -assert(!ctx->batches->entries);
> +assert(!ctx->fbo_to_batch->entries);
>  
>  /* Collect batch fences before returning */
>  panfrost_gc_fences(ctx);
> @@ -1183,9 +1184,9 @@ panfrost_batch_is_scanout(struct panfrost_batch *batch)
>  void
>  panfrost_batch_init(struct panfrost_context *ctx)
>  {
> -ctx->batches = _mesa_hash_table_create(ctx,
> -

Re: [Mesa-dev] [PATCH v3 14/17] panfrost: Do fine-grained flushing when preparing BO for CPU accesses

Looks good, still r-b. But while we're here, let's get this right:

> @@ -578,10 +578,8 @@ panfrost_transfer_map(struct pipe_context *pctx,
>  is_bound |= fb->cbufs[c]->texture == resource;
>  }
>  
> -if (is_bound && (usage & PIPE_TRANSFER_READ)) {
> -assert(level == 0);
> -panfrost_flush_all_batches(ctx, true);
> -}
> +if (is_bound && (usage & PIPE_TRANSFER_READ))
> + assert(level == 0);

Everything from "Check if we're bound..." to "flush_all_batches()}" can
be removed along with this commit :)

I.e. just delete all the lines from L566 - L585 on master as of
46b7512b0a73 ... of course your line numbers are a bit different.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v3 13/17] panfrost: Make sure the BO is 'ready' when picked from the cache

Very nice! I'm quite happy with this version, all considered, so R-b!

On Wed, Sep 18, 2019 at 03:24:35PM +0200, Boris Brezillon wrote:
> This is needed if we want to free the panfrost_batch object at submit
> time in order to not have to GC the batch on the next job submission.
> 
> Signed-off-by: Boris Brezillon 
> ---
> Changes in v3:
> * Move the patch later in the series and squash "panfrost: Cache GPU
>   accesses to BOs" in it
> * Add extra comments to explain what we're doing
> ---
>  src/gallium/drivers/panfrost/pan_bo.c  | 112 -
>  src/gallium/drivers/panfrost/pan_bo.h  |   9 ++
>  src/gallium/drivers/panfrost/pan_job.c |  11 +++
>  3 files changed, 109 insertions(+), 23 deletions(-)
> 
> diff --git a/src/gallium/drivers/panfrost/pan_bo.c 
> b/src/gallium/drivers/panfrost/pan_bo.c
> index 9daddf9d0cc2..37602688d630 100644
> --- a/src/gallium/drivers/panfrost/pan_bo.c
> +++ b/src/gallium/drivers/panfrost/pan_bo.c
> @@ -23,6 +23,7 @@
>   * Authors (Collabora):
>   *   Alyssa Rosenzweig 
>   */
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -101,6 +102,63 @@ panfrost_bo_free(struct panfrost_bo *bo)
>  ralloc_free(bo);
>  }
>  
> +/* Returns true if the BO is ready, false otherwise.
> + * access_type is encoding the type of access one wants to ensure is done.
> + * Say you want to make sure all writers are done writing, you should pass
> + * PAN_BO_ACCESS_WRITE.
> + * If you want to wait for all users, you should pass PAN_BO_ACCESS_RW.
> + * PAN_BO_ACCESS_READ would work too as waiting for readers implies
> + * waiting for writers as well, but we want to make things explicit and 
> waiting
> + * only for readers is impossible.
> + */
> +bool
> +panfrost_bo_wait(struct panfrost_bo *bo, int64_t timeout_ns,
> + uint32_t access_type)
> +{
> +struct drm_panfrost_wait_bo req = {
> +.handle = bo->gem_handle,
> + .timeout_ns = timeout_ns,
> +};
> +int ret;
> +
> +assert(access_type == PAN_BO_ACCESS_WRITE ||
> +   access_type == PAN_BO_ACCESS_RW);
> +
> +/* If the BO has been exported or imported we can't rely on the 
> cached
> + * state, we need to call the WAIT_BO ioctl.
> + */
> +if (!(bo->flags & (PAN_BO_IMPORTED | PAN_BO_EXPORTED))) {
> +/* If ->gpu_access is 0, the BO is idle, no need to wait. */
> +if (!bo->gpu_access)
> +return true;
> +
> +/* If the caller only wants to wait for writers and no
> + * writes are pending, we don't have to wait.
> + */
> +if (access_type == PAN_BO_ACCESS_WRITE &&
> +!(bo->gpu_access & PAN_BO_ACCESS_WRITE))
> +return true;
> +}
> +
> +/* The ioctl returns >= 0 value when the BO we are waiting for is 
> ready
> + * -1 otherwise.
> + */
> +ret = drmIoctl(bo->screen->fd, DRM_IOCTL_PANFROST_WAIT_BO, );
> +if (ret != -1) {
> +/* Set gpu_access to 0 so that the next call to bo_wait()
> + * doesn't have to call the WAIT_BO ioctl.
> + */
> +bo->gpu_access = 0;
> +return true;
> +}
> +
> +/* If errno is not ETIMEDOUT or EBUSY that means the handle we passed
> + * is invalid, which shouldn't happen here.
> + */
> +assert(errno == ETIMEDOUT || errno == EBUSY);
> +return false;
> +}
> +
>  /* Helper to calculate the bucket index of a BO */
>  
>  static unsigned
> @@ -137,9 +195,8 @@ pan_bucket(struct panfrost_screen *screen, unsigned size)
>   * BO. */
>  
>  static struct panfrost_bo *
> -panfrost_bo_cache_fetch(
> -struct panfrost_screen *screen,
> -size_t size, uint32_t flags)
> +panfrost_bo_cache_fetch(struct panfrost_screen *screen,
> +size_t size, uint32_t flags, bool dontwait)
>  {
>  pthread_mutex_lock(>bo_cache_lock);
>  struct list_head *bucket = pan_bucket(screen, size);
> @@ -147,27 +204,30 @@ panfrost_bo_cache_fetch(
>  
>  /* Iterate the bucket looking for something suitable */
>  list_for_each_entry_safe(struct panfrost_bo, entry, bucket, link) {
> -if (entry->size >= size &&
> -entry->flags == flags) {
> -int ret;
> -struct drm_panfrost_madvise madv;
> +if (entry->size <

Re: [Mesa-dev] [PATCH v3 08/17] panfrost: Add a panfrost_flush_all_batches() helper

(Still r-b)

On Wed, Sep 18, 2019 at 03:24:30PM +0200, Boris Brezillon wrote:
> And use it in panfrost_flush() to flush all batches, and not only the
> one currently bound to the context.
> 
> We also replace all internal calls to panfrost_flush() by
> panfrost_flush_all_batches() ones.
> 
> Signed-off-by: Boris Brezillon 
> Reviewed-by: Alyssa Rosenzweig 
> ---
> Changes in v3:
> * Add missing blank line
> * Collect R-b
> ---
>  src/gallium/drivers/panfrost/pan_compute.c  |  2 +-
>  src/gallium/drivers/panfrost/pan_context.c  | 23 +++
>  src/gallium/drivers/panfrost/pan_job.c  | 46 -
>  src/gallium/drivers/panfrost/pan_job.h  |  2 +-
>  src/gallium/drivers/panfrost/pan_resource.c |  6 +--
>  5 files changed, 64 insertions(+), 15 deletions(-)
> 
> diff --git a/src/gallium/drivers/panfrost/pan_compute.c 
> b/src/gallium/drivers/panfrost/pan_compute.c
> index 4639c1b03c38..036dffbb17be 100644
> --- a/src/gallium/drivers/panfrost/pan_compute.c
> +++ b/src/gallium/drivers/panfrost/pan_compute.c
> @@ -133,7 +133,7 @@ panfrost_launch_grid(struct pipe_context *pipe,
>  /* Queue the job */
>  panfrost_scoreboard_queue_compute_job(batch, transfer);
>  
> -panfrost_flush(pipe, NULL, PIPE_FLUSH_END_OF_FRAME);
> +panfrost_flush_all_batches(ctx, true);
>  }
>  
>  void
> diff --git a/src/gallium/drivers/panfrost/pan_context.c 
> b/src/gallium/drivers/panfrost/pan_context.c
> index aad69e3f9991..861b4b621602 100644
> --- a/src/gallium/drivers/panfrost/pan_context.c
> +++ b/src/gallium/drivers/panfrost/pan_context.c
> @@ -1348,7 +1348,6 @@ panfrost_flush(
>  unsigned flags)
>  {
>  struct panfrost_context *ctx = pan_context(pipe);
> -struct panfrost_batch *batch = panfrost_get_batch_for_fbo(ctx);
>  struct util_dynarray fences;
>  
>  /* We must collect the fences before the flush is done, otherwise 
> we'll
> @@ -1356,13 +1355,18 @@ panfrost_flush(
>   */
>  if (fence) {
>  util_dynarray_init(, NULL);
> -panfrost_batch_fence_reference(batch->out_sync);
> -util_dynarray_append(, struct panfrost_batch_fence *,
> - batch->out_sync);
> +hash_table_foreach(ctx->batches, hentry) {
> +struct panfrost_batch *batch = hentry->data;
> +
> +panfrost_batch_fence_reference(batch->out_sync);
> +util_dynarray_append(,
> + struct panfrost_batch_fence *,
> + batch->out_sync);
> +}
>  }
>  
> -/* Submit the frame itself */
> -panfrost_batch_submit(batch);
> +/* Submit all pending jobs */
> +panfrost_flush_all_batches(ctx, false);
>  
>  if (fence) {
>  struct panfrost_fence *f = panfrost_fence_create(ctx, 
> );
> @@ -2321,7 +2325,7 @@ panfrost_set_framebuffer_state(struct pipe_context 
> *pctx,
>  }
>  
>  if (!is_scanout || has_draws)
> -panfrost_flush(pctx, NULL, PIPE_FLUSH_END_OF_FRAME);
> +panfrost_flush_all_batches(ctx, true);
>  else
>  
> assert(!ctx->payloads[PIPE_SHADER_VERTEX].postfix.framebuffer &&
> 
> !ctx->payloads[PIPE_SHADER_FRAGMENT].postfix.framebuffer);
> @@ -2553,6 +2557,7 @@ panfrost_get_query_result(struct pipe_context *pipe,
>union pipe_query_result *vresult)
>  {
>  struct panfrost_query *query = (struct panfrost_query *) q;
> +struct panfrost_context *ctx = pan_context(pipe);
>  
>  
>  switch (query->type) {
> @@ -2560,7 +2565,7 @@ panfrost_get_query_result(struct pipe_context *pipe,
>  case PIPE_QUERY_OCCLUSION_PREDICATE:
>  case PIPE_QUERY_OCCLUSION_PREDICATE_CONSERVATIVE:
>  /* Flush first */
> -panfrost_flush(pipe, NULL, PIPE_FLUSH_END_OF_FRAME);
> +panfrost_flush_all_batches(ctx, true);
>  
>  /* Read back the query results */
>  unsigned *result = (unsigned *) query->transfer.cpu;
> @@ -2576,7 +2581,7 @@ panfrost_get_query_result(struct pipe_context *pipe,
>  
>  case PIPE_QUERY_PRIMITIVES_GENERATED:
>  case PIPE_QUERY_PRIMITIVES_EMITTED:
> -panfrost_flush(pipe, NULL, PIPE_FLUSH_END_OF_FRAME);
> +panfrost_flush_all_batches(ctx, true);
>  vresult->u64 = query->end - query-&

Re: [Mesa-dev] [PATCH v3 09/17] panfrost: Add a panfrost_flush_batches_accessing_bo() helper

"
On Wed, Sep 18, 2019 at 03:24:31PM +0200, Boris Brezillon wrote:
> This will allow us to only flush batches touching a specific resource,
> which is particularly useful when the CPU needs to access a BO.
> 
> Signed-off-by: Boris Brezillon 
> Reviewed-by: Alyssa Rosenzweig 
> ---
> Changes in v3:
> * Collect R-b
> ---
>  src/gallium/drivers/panfrost/pan_job.c | 31 ++
>  src/gallium/drivers/panfrost/pan_job.h |  4 
>  2 files changed, 35 insertions(+)
> 
> diff --git a/src/gallium/drivers/panfrost/pan_job.c 
> b/src/gallium/drivers/panfrost/pan_job.c
> index 3ccf4bb6b3e9..e7eae399830f 100644
> --- a/src/gallium/drivers/panfrost/pan_job.c
> +++ b/src/gallium/drivers/panfrost/pan_job.c
> @@ -952,6 +952,37 @@ panfrost_flush_all_batches(struct panfrost_context *ctx, 
> bool wait)
>  util_dynarray_fini();
>  }
>  
> +void
> +panfrost_flush_batches_accessing_bo(struct panfrost_context *ctx,
> +struct panfrost_bo *bo,
> +uint32_t access_type)
> +{
> +struct panfrost_bo_access *access;
> +struct hash_entry *hentry;
> +
> +/* It doesn't make any to flush only the readers. */
> +assert(access_type == PAN_BO_ACCESS_WRITE ||
> +   access_type == PAN_BO_ACCESS_RW);
> +
> +hentry = _mesa_hash_table_search(ctx->accessed_bos, bo);
> +access = hentry ? hentry->data : NULL;
> +if (!access)
> +return;
> +
> +if (access_type & PAN_BO_ACCESS_WRITE && access->writer &&
> +access->writer->batch)
> +panfrost_batch_submit(access->writer->batch);
> +
> +if (!(access_type & PAN_BO_ACCESS_READ))
> +return;
> +
> +util_dynarray_foreach(>readers, struct panfrost_batch_fence 
> *,
> +  reader) {
> +if (*reader && (*reader)->batch)
> +panfrost_batch_submit((*reader)->batch);
> +}
> +}
> +
>  void
>  panfrost_batch_set_requirements(struct panfrost_batch *batch)
>  {
> diff --git a/src/gallium/drivers/panfrost/pan_job.h 
> b/src/gallium/drivers/panfrost/pan_job.h
> index e95e156a40f8..25905b516739 100644
> --- a/src/gallium/drivers/panfrost/pan_job.h
> +++ b/src/gallium/drivers/panfrost/pan_job.h
> @@ -185,6 +185,10 @@ panfrost_batch_create_bo(struct panfrost_batch *batch, 
> size_t size,
>  void
>  panfrost_flush_all_batches(struct panfrost_context *ctx, bool wait);
>  
> +void
> +panfrost_flush_batches_accessing_bo(struct panfrost_context *ctx,
> +struct panfrost_bo *bo, uint32_t flags);
> +
>  void
>  panfrost_batch_set_requirements(struct panfrost_batch *batch);
>  
> -- 
> 2.21.0
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v3 07/17] panfrost: Prepare panfrost_fence for batch pipelining

(Still r-b)

On Wed, Sep 18, 2019 at 03:24:29PM +0200, Boris Brezillon wrote:
> The panfrost_fence logic currently waits on the last submitted batch,
> but the batch serialization that was enforced in
> panfrost_batch_submit() is about to go away, allowing for several
> batches to be pipelined, and the last submitted one is not necessarily
> the one that will finish last.
> 
> We need to make sure the fence logic waits on all flushed batches, not
> only the last one.
> 
> Signed-off-by: Boris Brezillon 
> Reviewed-by: Alyssa Rosenzweig 
> ---
> Changes in v3:
> * Fix a comment
> * Adjust things to match the changes done in "panfrost: Add a batch fence"
> ---
>  src/gallium/drivers/panfrost/pan_context.c | 18 +-
>  src/gallium/drivers/panfrost/pan_context.h |  5 +-
>  src/gallium/drivers/panfrost/pan_job.c | 16 -
>  src/gallium/drivers/panfrost/pan_screen.c  | 71 +++---
>  src/gallium/drivers/panfrost/pan_screen.h  |  3 +-
>  5 files changed, 57 insertions(+), 56 deletions(-)
> 
> diff --git a/src/gallium/drivers/panfrost/pan_context.c 
> b/src/gallium/drivers/panfrost/pan_context.c
> index 312a9e93e455..aad69e3f9991 100644
> --- a/src/gallium/drivers/panfrost/pan_context.c
> +++ b/src/gallium/drivers/panfrost/pan_context.c
> @@ -1349,14 +1349,30 @@ panfrost_flush(
>  {
>  struct panfrost_context *ctx = pan_context(pipe);
>  struct panfrost_batch *batch = panfrost_get_batch_for_fbo(ctx);
> +struct util_dynarray fences;
> +
> +/* We must collect the fences before the flush is done, otherwise 
> we'll
> + * lose track of them.
> + */
> +if (fence) {
> +util_dynarray_init(, NULL);
> +panfrost_batch_fence_reference(batch->out_sync);
> +util_dynarray_append(, struct panfrost_batch_fence *,
> + batch->out_sync);
> +}
>  
>  /* Submit the frame itself */
>  panfrost_batch_submit(batch);
>  
>  if (fence) {
> -struct panfrost_fence *f = panfrost_fence_create(ctx);
> +struct panfrost_fence *f = panfrost_fence_create(ctx, 
> );
>  pipe->screen->fence_reference(pipe->screen, fence, NULL);
>  *fence = (struct pipe_fence_handle *)f;
> +
> +util_dynarray_foreach(, struct panfrost_batch_fence 
> *, fence)
> +panfrost_batch_fence_unreference(*fence);
> +
> +util_dynarray_fini();
>  }
>  }
>  
> diff --git a/src/gallium/drivers/panfrost/pan_context.h 
> b/src/gallium/drivers/panfrost/pan_context.h
> index 3b09952345cf..d50ed57d5d8a 100644
> --- a/src/gallium/drivers/panfrost/pan_context.h
> +++ b/src/gallium/drivers/panfrost/pan_context.h
> @@ -94,7 +94,7 @@ struct panfrost_query {
>  
>  struct panfrost_fence {
>  struct pipe_reference reference;
> -int fd;
> +struct util_dynarray syncfds;
>  };
>  
>  struct panfrost_streamout {
> @@ -193,9 +193,6 @@ struct panfrost_context {
>  
>  /* True for t6XX, false for t8xx. */
>  bool is_t6xx;
> -
> -/* The out sync fence of the last submitted batch. */
> -struct panfrost_batch_fence *last_out_sync;
>  };
>  
>  /* Corresponds to the CSO */
> diff --git a/src/gallium/drivers/panfrost/pan_job.c 
> b/src/gallium/drivers/panfrost/pan_job.c
> index b0494af3482f..211e48bafd4e 100644
> --- a/src/gallium/drivers/panfrost/pan_job.c
> +++ b/src/gallium/drivers/panfrost/pan_job.c
> @@ -819,13 +819,6 @@ panfrost_batch_submit_ioctl(struct panfrost_batch *batch,
>  free(bo_handles);
>  free(in_syncs);
>  
> -/* Release the last batch fence if any, and retain the new one */
> -if (ctx->last_out_sync)
> -panfrost_batch_fence_unreference(ctx->last_out_sync);
> -
> -panfrost_batch_fence_reference(batch->out_sync);
> -ctx->last_out_sync = batch->out_sync;
> -
>  if (ret) {
>  fprintf(stderr, "Error submitting: %m\n");
>  return errno;
> @@ -884,15 +877,6 @@ panfrost_batch_submit(struct panfrost_batch *batch)
>   * to wait on it.
>   */
>  batch->out_sync->signaled = true;
> -
> -/* Release the last batch fence if any, and set 
> ->last_out_sync
> - * to NULL
> - */
> -if (ctx->last_out_sync) {
> -panfrost_batch_fence_unreference(ctx->last_out_sync);
> -

Re: [Mesa-dev] [PATCH v3 06/17] panfrost: Start tracking inter-batch dependencies

R-b. nice work!

On Wed, Sep 18, 2019 at 03:24:28PM +0200, Boris Brezillon wrote:
> The idea is to track which BO are being accessed and the type of access
> to determine when a dependency exists. Thanks to that we can build a
> dependency graph that will allow us to flush batches in the correct
> order.
> 
> Signed-off-by: Boris Brezillon 
> ---
> Changes in v3:
> * Fix coding style issues
> * Do not check for batch presence in the reader array when updating
>   a BO access (we already have this information)
> * Add more comments to explain what we're doing and why we're doing
>   it like that
> ---
>  src/gallium/drivers/panfrost/pan_context.h |   3 +
>  src/gallium/drivers/panfrost/pan_job.c | 355 -
>  src/gallium/drivers/panfrost/pan_job.h |   3 +
>  3 files changed, 356 insertions(+), 5 deletions(-)
> 
> diff --git a/src/gallium/drivers/panfrost/pan_context.h 
> b/src/gallium/drivers/panfrost/pan_context.h
> index ce3e0c899a4f..3b09952345cf 100644
> --- a/src/gallium/drivers/panfrost/pan_context.h
> +++ b/src/gallium/drivers/panfrost/pan_context.h
> @@ -114,6 +114,9 @@ struct panfrost_context {
>  struct panfrost_batch *batch;
>  struct hash_table *batches;
>  
> +/* panfrost_bo -> panfrost_bo_access */
> +struct hash_table *accessed_bos;
> +
>  /* Within a launch_grid call.. */
>  const struct pipe_grid_info *compute_grid;
>  
> diff --git a/src/gallium/drivers/panfrost/pan_job.c 
> b/src/gallium/drivers/panfrost/pan_job.c
> index 872c846207bf..b0494af3482f 100644
> --- a/src/gallium/drivers/panfrost/pan_job.c
> +++ b/src/gallium/drivers/panfrost/pan_job.c
> @@ -36,6 +36,29 @@
>  #include "pan_util.h"
>  #include "pandecode/decode.h"
>  
> +/* panfrost_bo_access is here to help us keep track of batch accesses to BOs
> + * and build a proper dependency graph such that batches can be pipelined for
> + * better GPU utilization.
> + *
> + * Each accessed BO has a corresponding entry in the ->accessed_bos hash 
> table.
> + * A BO is either being written or read at any time, that's what the type 
> field
> + * encodes.
> + * When the last access is a write, the batch writing the BO might have read
> + * dependencies (readers that have not been executed yet and want to read the
> + * previous BO content), and when the last access is a read, all readers 
> might
> + * depend on another batch to push its results to memory. That's what the
> + * readers/writers keep track off.
> + * There can only be one writer at any given time, if a new batch wants to
> + * write to the same BO, a dependency will be added between the new writer 
> and
> + * the old writer (at the batch level), and panfrost_bo_access->writer will 
> be
> + * updated to point to the new writer.
> + */
> +struct panfrost_bo_access {
> +uint32_t type;
> +struct util_dynarray readers;
> +struct panfrost_batch_fence *writer;
> +};
> +
>  static struct panfrost_batch_fence *
>  panfrost_create_batch_fence(struct panfrost_batch *batch)
>  {
> @@ -92,6 +115,7 @@ panfrost_create_batch(struct panfrost_context *ctx,
>  
>  util_dynarray_init(>headers, batch);
>  util_dynarray_init(>gpu_headers, batch);
> +util_dynarray_init(>dependencies, batch);
>  batch->out_sync = panfrost_create_batch_fence(batch);
>  util_copy_framebuffer_state(>key, key);
>  
> @@ -151,6 +175,11 @@ panfrost_free_batch(struct panfrost_batch *batch)
>  hash_table_foreach(batch->bos, entry)
>  panfrost_bo_unreference((struct panfrost_bo *)entry->key);
>  
> +util_dynarray_foreach(>dependencies,
> +  struct panfrost_batch_fence *, dep) {
> +panfrost_batch_fence_unreference(*dep);
> +}
> +
>  /* The out_sync fence lifetime is different from the the batch one
>   * since other batches might want to wait on a fence of already
>   * submitted/signaled batch. All we need to do here is make sure the
> @@ -164,6 +193,56 @@ panfrost_free_batch(struct panfrost_batch *batch)
>  ralloc_free(batch);
>  }
>  
> +#ifndef NDEBUG
> +static bool
> +panfrost_dep_graph_contains_batch(struct panfrost_batch *root,
> +  struct panfrost_batch *batch)
> +{
> +if (!root)
> +return false;
> +
> +util_dynarray_foreach(>dependencies,
> +  struct panfrost_batch_fence *, dep) {
> +if ((*dep)->batch == batch ||
> +panfrost_dep_graph_contains_batch((*dep)->batch, batch))
> +return true;
> +}
> +
> +return false;
> +}
> +#endif
> +
> +static void
> +panfrost_batch_add_dep(struct panfrost_batch *batch,
> +   struct panfrost_batch_fence *newdep)
> +{
> +if (batch == newdep->batch)
> +return;
> +
> +/* We might want to turn ->dependencies into a set

Re: [Mesa-dev] [PATCH v3 05/17] panfrost: Add a panfrost_freeze_batch() helper

Still r-b, but please use the collabora address:)
On Wed, Sep 18, 2019 at 03:24:27PM +0200, Boris Brezillon wrote:
> We'll soon need to freeze a batch not only when it's flushed, but also
> when another batch depends on us, so let's add a helper to avoid
> duplicating the logic.
> 
> Signed-off-by: Boris Brezillon 
> Reviewed-by: Alyssa Rosenzweig 
> ---
> Changes in v3:
> * Collect R-b
> ---
>  src/gallium/drivers/panfrost/pan_job.c | 62 ++
>  1 file changed, 44 insertions(+), 18 deletions(-)
> 
> diff --git a/src/gallium/drivers/panfrost/pan_job.c 
> b/src/gallium/drivers/panfrost/pan_job.c
> index 55780dd3d9d6..872c846207bf 100644
> --- a/src/gallium/drivers/panfrost/pan_job.c
> +++ b/src/gallium/drivers/panfrost/pan_job.c
> @@ -98,22 +98,59 @@ panfrost_create_batch(struct panfrost_context *ctx,
>  return batch;
>  }
>  
> +static void
> +panfrost_freeze_batch(struct panfrost_batch *batch)
> +{
> +struct panfrost_context *ctx = batch->ctx;
> +struct hash_entry *entry;
> +
> +/* Remove the entry in the FBO -> batch hash table if the batch
> + * matches. This way, next draws/clears targeting this FBO will 
> trigger
> + * the creation of a new batch.
> + */
> +entry = _mesa_hash_table_search(ctx->batches, >key);
> +if (entry && entry->data == batch)
> +_mesa_hash_table_remove(ctx->batches, entry);
> +
> +/* If this is the bound batch, the panfrost_context parameters are
> + * relevant so submitting it invalidates those parameters, but if 
> it's
> + * not bound, the context parameters are for some other batch so we
> + * can't invalidate them.
> + */
> +if (ctx->batch == batch) {
> +panfrost_invalidate_frame(ctx);
> +ctx->batch = NULL;
> +}
> +}
> +
> +#ifndef NDEBUG
> +static bool panfrost_batch_is_frozen(struct panfrost_batch *batch)
> +{
> +struct panfrost_context *ctx = batch->ctx;
> +struct hash_entry *entry;
> +
> +entry = _mesa_hash_table_search(ctx->batches, >key);
> +if (entry && entry->data == batch)
> +return false;
> +
> +if (ctx->batch == batch)
> +return false;
> +
> +return true;
> +}
> +#endif
> +
>  static void
>  panfrost_free_batch(struct panfrost_batch *batch)
>  {
>  if (!batch)
>  return;
>  
> -struct panfrost_context *ctx = batch->ctx;
> +assert(panfrost_batch_is_frozen(batch));
>  
>  hash_table_foreach(batch->bos, entry)
>  panfrost_bo_unreference((struct panfrost_bo *)entry->key);
>  
> -_mesa_hash_table_remove_key(ctx->batches, >key);
> -
> -if (ctx->batch == batch)
> -ctx->batch = NULL;
> -
>  /* The out_sync fence lifetime is different from the the batch one
>   * since other batches might want to wait on a fence of already
>   * submitted/signaled batch. All we need to do here is make sure the
> @@ -529,19 +566,8 @@ panfrost_batch_submit(struct panfrost_batch *batch)
>  fprintf(stderr, "panfrost_batch_submit failed: %d\n", ret);
>  
>  out:
> -/* If this is the bound batch, the panfrost_context parameters are
> - * relevant so submitting it invalidates those paramaters, but if 
> it's
> - * not bound, the context parameters are for some other batch so we
> - * can't invalidate them.
> - */
> -if (ctx->batch == batch)
> -panfrost_invalidate_frame(ctx);
> -
> -/* The job has been submitted, let's invalidate the current FBO job
> - * cache.
> -  */
> +panfrost_freeze_batch(batch);
>  assert(!ctx->batch || batch == ctx->batch);
> -ctx->batch = NULL;
>  
>  /* We always stall the pipeline for correct results since pipelined
>   * rendering is quite broken right now (to be fixed by the 
> panfrost_job
> -- 
> 2.21.0
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v3 04/17] panfrost: Use the per-batch fences to wait on the last submitted batch

R-b
On Wed, Sep 18, 2019 at 03:24:26PM +0200, Boris Brezillon wrote:
> We just replace the per-context out_sync object by a pointer to the
> the fence of the last last submitted batch. Pipelining of batches will
> come later.
> 
> Signed-off-by: Boris Brezillon 
> ---
> Alyssa, I dropped your R-b since the other changes you asked me to do
> in "panfrost: Add a batch fence" had some impact on this patch.
> 
> Changes in v3:
> * Make sure we don't try to wait on dummy batches (those with no
>   vertex/tiler/fragment jobs)
> ---
>  src/gallium/drivers/panfrost/pan_context.c |  6 
>  src/gallium/drivers/panfrost/pan_context.h |  3 +-
>  src/gallium/drivers/panfrost/pan_job.c | 35 ++
>  src/gallium/drivers/panfrost/pan_screen.c  | 18 +--
>  4 files changed, 47 insertions(+), 15 deletions(-)
> 
> diff --git a/src/gallium/drivers/panfrost/pan_context.c 
> b/src/gallium/drivers/panfrost/pan_context.c
> index 65a6c7f8c5ae..312a9e93e455 100644
> --- a/src/gallium/drivers/panfrost/pan_context.c
> +++ b/src/gallium/drivers/panfrost/pan_context.c
> @@ -2702,12 +2702,6 @@ panfrost_create_context(struct pipe_screen *screen, 
> void *priv, unsigned flags)
>  panfrost_blend_context_init(gallium);
>  panfrost_compute_context_init(gallium);
>  
> -ASSERTED int ret;
> -
> -ret = drmSyncobjCreate(pscreen->fd, DRM_SYNCOBJ_CREATE_SIGNALED,
> -   >out_sync);
> -assert(!ret);
> -
>  /* XXX: leaks */
>  gallium->stream_uploader = u_upload_create_default(gallium);
>  gallium->const_uploader = gallium->stream_uploader;
> diff --git a/src/gallium/drivers/panfrost/pan_context.h 
> b/src/gallium/drivers/panfrost/pan_context.h
> index c145d589757e..ce3e0c899a4f 100644
> --- a/src/gallium/drivers/panfrost/pan_context.h
> +++ b/src/gallium/drivers/panfrost/pan_context.h
> @@ -191,7 +191,8 @@ struct panfrost_context {
>  /* True for t6XX, false for t8xx. */
>  bool is_t6xx;
>  
> -uint32_t out_sync;
> +/* The out sync fence of the last submitted batch. */
> +struct panfrost_batch_fence *last_out_sync;
>  };
>  
>  /* Corresponds to the CSO */
> diff --git a/src/gallium/drivers/panfrost/pan_job.c 
> b/src/gallium/drivers/panfrost/pan_job.c
> index b6763da66a97..55780dd3d9d6 100644
> --- a/src/gallium/drivers/panfrost/pan_job.c
> +++ b/src/gallium/drivers/panfrost/pan_job.c
> @@ -425,11 +425,13 @@ panfrost_batch_submit_ioctl(struct panfrost_batch 
> *batch,
>  uint32_t *bo_handles;
>  int ret;
>  
> -submit.in_syncs = (u64) (uintptr_t) >out_sync;
> -submit.in_sync_count = 1;
>  
> -submit.out_sync = ctx->out_sync;
> +if (ctx->last_out_sync) {
> +submit.in_sync_count = 1;
> +submit.in_syncs = (uintptr_t)>last_out_sync->syncobj;
> +}
>  
> +submit.out_sync = batch->out_sync->syncobj;
>  submit.jc = first_job_desc;
>  submit.requirements = reqs;
>  
> @@ -445,6 +447,14 @@ panfrost_batch_submit_ioctl(struct panfrost_batch *batch,
>  submit.bo_handles = (u64) (uintptr_t) bo_handles;
>  ret = drmIoctl(screen->fd, DRM_IOCTL_PANFROST_SUBMIT, );
>  free(bo_handles);
> +
> +/* Release the last batch fence if any, and retain the new one */
> +if (ctx->last_out_sync)
> +panfrost_batch_fence_unreference(ctx->last_out_sync);
> +
> +panfrost_batch_fence_reference(batch->out_sync);
> +ctx->last_out_sync = batch->out_sync;
> +
>  if (ret) {
>  fprintf(stderr, "Error submitting: %m\n");
>  return errno;
> @@ -453,7 +463,8 @@ panfrost_batch_submit_ioctl(struct panfrost_batch *batch,
>  /* Trace the job if we're doing that */
>  if (pan_debug & PAN_DBG_TRACE) {
>  /* Wait so we can get errors reported back */
> -drmSyncobjWait(screen->fd, >out_sync, 1, INT64_MAX, 0, 
> NULL);
> +drmSyncobjWait(screen->fd, >out_sync->syncobj, 1,
> +   INT64_MAX, 0, NULL);
>  pandecode_jc(submit.jc, FALSE);
>  }
>  
> @@ -495,6 +506,15 @@ panfrost_batch_submit(struct panfrost_batch *batch)
>   * to wait on it.
>   */
>  batch->out_sync->signaled = true;
> +
> +/* Release the last batch fence if any, and set 
> ->last_out_sync
> + * to NULL
> + */
> +if (ctx->last_out_sync) {
> +panfrost_batch_fence_unreference(ctx->last_out_sync);
> +ctx->last_out_sync = NULL;
> +}
> +
>  goto out;
>  }
>  
> @@ -527,8 +547,11 @@ out:
>   * rendering is quite broken right now (to be fixed by the 
> panfrost_job
>   * refactor, just take the perf hit for correctness)
>   */
> -

Re: [Mesa-dev] [PATCH v3 03/17] panfrost: Add a batch fence

R-b

On Wed, Sep 18, 2019 at 03:24:25PM +0200, Boris Brezillon wrote:
> So we can implement fine-grained dependency tracking between batches.
> 
> Signed-off-by: Boris Brezillon 
> ---
> Changes in v3:
> * Fix typos
> * Do not initialize the syncobj in a signaled state, and set
>   fence->signaled to true when submitting a dummy batch (one with no
>   draw/clear queued)
> ---
>  src/gallium/drivers/panfrost/pan_job.c | 56 +-
>  src/gallium/drivers/panfrost/pan_job.h | 39 ++
>  2 files changed, 94 insertions(+), 1 deletion(-)
> 
> diff --git a/src/gallium/drivers/panfrost/pan_job.c 
> b/src/gallium/drivers/panfrost/pan_job.c
> index 785317dbd0b0..b6763da66a97 100644
> --- a/src/gallium/drivers/panfrost/pan_job.c
> +++ b/src/gallium/drivers/panfrost/pan_job.c
> @@ -36,6 +36,45 @@
>  #include "pan_util.h"
>  #include "pandecode/decode.h"
>  
> +static struct panfrost_batch_fence *
> +panfrost_create_batch_fence(struct panfrost_batch *batch)
> +{
> +struct panfrost_batch_fence *fence;
> +ASSERTED int ret;
> +
> +fence = rzalloc(NULL, struct panfrost_batch_fence);
> +assert(fence);
> +pipe_reference_init(>reference, 1);
> +fence->ctx = batch->ctx;
> +fence->batch = batch;
> +ret = drmSyncobjCreate(pan_screen(batch->ctx->base.screen)->fd, 0,
> +   >syncobj);
> +assert(!ret);
> +
> +return fence;
> +}
> +
> +static void
> +panfrost_free_batch_fence(struct panfrost_batch_fence *fence)
> +{
> +drmSyncobjDestroy(pan_screen(fence->ctx->base.screen)->fd,
> +  fence->syncobj);
> +ralloc_free(fence);
> +}
> +
> +void
> +panfrost_batch_fence_unreference(struct panfrost_batch_fence *fence)
> +{
> +if (pipe_reference(>reference, NULL))
> + panfrost_free_batch_fence(fence);
> +}
> +
> +void
> +panfrost_batch_fence_reference(struct panfrost_batch_fence *fence)
> +{
> +pipe_reference(NULL, >reference);
> +}
> +
>  static struct panfrost_batch *
>  panfrost_create_batch(struct panfrost_context *ctx,
>const struct pipe_framebuffer_state *key)
> @@ -53,6 +92,7 @@ panfrost_create_batch(struct panfrost_context *ctx,
>  
>  util_dynarray_init(>headers, batch);
>  util_dynarray_init(>gpu_headers, batch);
> +batch->out_sync = panfrost_create_batch_fence(batch);
>  util_copy_framebuffer_state(>key, key);
>  
>  return batch;
> @@ -74,6 +114,15 @@ panfrost_free_batch(struct panfrost_batch *batch)
>  if (ctx->batch == batch)
>  ctx->batch = NULL;
>  
> +/* The out_sync fence lifetime is different from the the batch one
> + * since other batches might want to wait on a fence of already
> + * submitted/signaled batch. All we need to do here is make sure the
> + * fence does not point to an invalid batch, which the core will
> + * interpret as 'batch is already submitted'.
> + */
> +batch->out_sync->batch = NULL;
> +panfrost_batch_fence_unreference(batch->out_sync);
> +
>  util_unreference_framebuffer_state(>key);
>  ralloc_free(batch);
>  }
> @@ -441,8 +490,13 @@ panfrost_batch_submit(struct panfrost_batch *batch)
>  int ret;
>  
>  /* Nothing to do! */
> -if (!batch->last_job.gpu && !batch->clear)
> +if (!batch->last_job.gpu && !batch->clear) {
> +/* Mark the fence as signaled so the fence logic does not try
> + * to wait on it.
> + */
> +batch->out_sync->signaled = true;
>  goto out;
> +}
>  
>  if (!batch->clear && batch->last_tiler.gpu)
>  panfrost_batch_draw_wallpaper(batch);
> diff --git a/src/gallium/drivers/panfrost/pan_job.h 
> b/src/gallium/drivers/panfrost/pan_job.h
> index 3f2cf1a999f3..88f1e4620fd0 100644
> --- a/src/gallium/drivers/panfrost/pan_job.h
> +++ b/src/gallium/drivers/panfrost/pan_job.h
> @@ -31,6 +31,36 @@
>  #include "pan_allocate.h"
>  #include "pan_resource.h"
>  
> +/* panfrost_batch_fence is the out fence of a batch that users or other 
> batches
> + * might want to wait on. The batch fence lifetime is different from the 
> batch
> + * one as want will certainly want to wait upon the fence after the batch has
> + * been submitted (which is when panfrost_batch objects are freed).
> + */
> +struct panfrost_batch_fence {
> +/* Refcounting object for the fence. */
> +struct pipe_reference reference;
> +
> +/* Batch that created this fence object. Will become NULL at batch
> + * submission time. This field is mainly here to know whether the
> + * batch has been flushed or not.
> + */
> +struct panfrost_batch *batch;
> +
> +/* Context this fence is attached to. We need both ctx and batch, as
> + * the batch will go away

Re: [Mesa-dev] [PATCH v3 02/17] panfrost: Make panfrost_batch->bos a hash table

R-b
On Wed, Sep 18, 2019 at 03:24:24PM +0200, Boris Brezillon wrote:
> So we can store the flags as data and keep the BO as a key. This way
> we keep track of the type of access done on BOs.
> 
> Signed-off-by: Boris Brezillon 
> ---
> Changes in v3:
> * None
> ---
>  src/gallium/drivers/panfrost/pan_job.c | 33 +-
>  src/gallium/drivers/panfrost/pan_job.h |  2 +-
>  2 files changed, 23 insertions(+), 12 deletions(-)
> 
> diff --git a/src/gallium/drivers/panfrost/pan_job.c 
> b/src/gallium/drivers/panfrost/pan_job.c
> index 8e2703ae168c..785317dbd0b0 100644
> --- a/src/gallium/drivers/panfrost/pan_job.c
> +++ b/src/gallium/drivers/panfrost/pan_job.c
> @@ -44,9 +44,8 @@ panfrost_create_batch(struct panfrost_context *ctx,
>  
>  batch->ctx = ctx;
>  
> -batch->bos = _mesa_set_create(batch,
> -  _mesa_hash_pointer,
> -  _mesa_key_pointer_equal);
> +batch->bos = _mesa_hash_table_create(batch, _mesa_hash_pointer,
> + _mesa_key_pointer_equal);
>  
>  batch->minx = batch->miny = ~0;
>  batch->maxx = batch->maxy = 0;
> @@ -67,10 +66,8 @@ panfrost_free_batch(struct panfrost_batch *batch)
>  
>  struct panfrost_context *ctx = batch->ctx;
>  
> -set_foreach(batch->bos, entry) {
> -struct panfrost_bo *bo = (struct panfrost_bo *)entry->key;
> -panfrost_bo_unreference(bo);
> -}
> +hash_table_foreach(batch->bos, entry)
> +panfrost_bo_unreference((struct panfrost_bo *)entry->key);
>  
>  _mesa_hash_table_remove_key(ctx->batches, >key);
>  
> @@ -138,11 +135,25 @@ panfrost_batch_add_bo(struct panfrost_batch *batch, 
> struct panfrost_bo *bo,
>  if (!bo)
>  return;
>  
> -if (_mesa_set_search(batch->bos, bo))
> +struct hash_entry *entry;
> +uint32_t old_flags = 0;
> +
> +entry = _mesa_hash_table_search(batch->bos, bo);
> +if (!entry) {
> +entry = _mesa_hash_table_insert(batch->bos, bo,
> +(void *)(uintptr_t)flags);
> +panfrost_bo_reference(bo);
> + } else {
> +old_flags = (uintptr_t)entry->data;
> +}
> +
> +assert(entry);
> +
> +if (old_flags == flags)
>  return;
>  
> -panfrost_bo_reference(bo);
> -_mesa_set_add(batch->bos, bo);
> +flags |= old_flags;
> +entry->data = (void *)(uintptr_t)flags;
>  }
>  
>  void panfrost_batch_add_fbo_bos(struct panfrost_batch *batch)
> @@ -376,7 +387,7 @@ panfrost_batch_submit_ioctl(struct panfrost_batch *batch,
>  bo_handles = calloc(batch->bos->entries, sizeof(*bo_handles));
>  assert(bo_handles);
>  
> -set_foreach(batch->bos, entry) {
> +hash_table_foreach(batch->bos, entry) {
>  struct panfrost_bo *bo = (struct panfrost_bo *)entry->key;
>  assert(bo->gem_handle > 0);
>  bo_handles[submit.bo_handle_count++] = bo->gem_handle;
> diff --git a/src/gallium/drivers/panfrost/pan_job.h 
> b/src/gallium/drivers/panfrost/pan_job.h
> index 0b37a3131e86..3f2cf1a999f3 100644
> --- a/src/gallium/drivers/panfrost/pan_job.h
> +++ b/src/gallium/drivers/panfrost/pan_job.h
> @@ -98,7 +98,7 @@ struct panfrost_batch {
>  unsigned job_index;
>  
>  /* BOs referenced -- will be used for flushing logic */
> -struct set *bos;
> +struct hash_table *bos;
>  
>  /* Current transient BO */
>   struct panfrost_bo *transient_bo;
> -- 
> 2.21.0
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v3 01/17] panfrost: Extend the panfrost_batch_add_bo() API to pass access flags

> @@ -1121,7 +1134,11 @@ panfrost_emit_for_draw(struct panfrost_context *ctx, 
> bool with_vertex_data)
>  
>  struct panfrost_shader_state *ss = 
> >variants[all->active_variant];
>  
> -panfrost_batch_add_bo(batch, ss->bo);
> +panfrost_batch_add_bo(batch, ss->bo,
> +  PAN_BO_ACCESS_PRIVATE |
> +  PAN_BO_ACCESS_READ |

> +  PAN_BO_ACCESS_VERTEX_TILER |
> +  PAN_BO_ACCESS_FRAGMENT);

I believe this should be just the access for the stage `i`

Although actually I am not at all sure what this batch_add_bo is doing
at all?

I think this batch_add_bo should probably dropped altogether? This loop
is dealing with constant buffers; the shaders themselves were added

>  void panfrost_batch_add_fbo_bos(struct panfrost_batch *batch)
>  {
> +uint32_t flags = PAN_BO_ACCESS_SHARED | PAN_BO_ACCESS_WRITE |
> + PAN_BO_ACCESS_VERTEX_TILER |
> + PAN_BO_ACCESS_FRAGMENT;

I think we can drop VERTEX_TILER here...? The buffers are written right
at the end of the FRAGMENT job, not touched before that.

If nothing else is broken, this should allow a nice perf boost with
pipelining, so the vertex/tiler from frame n+1 can run in parallel with
the fragment of frame n (rather than blocking on frame n finishing with
the FBOs).
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 3/3] panfrost: More tests are passing

R-b with pleasure! Glad to see those tests fixed, those have stumped me
for a *long* time :D

Congratulations! Thank you!

On Fri, Sep 20, 2019 at 04:53:39PM +0200, Boris Brezillon wrote:
> Remove the tests that are now passing.
> 
> Signed-off-by: Boris Brezillon 
> ---
>  .../drivers/panfrost/ci/expected-failures.txt | 153 --
>  1 file changed, 153 deletions(-)
> 
> diff --git a/src/gallium/drivers/panfrost/ci/expected-failures.txt 
> b/src/gallium/drivers/panfrost/ci/expected-failures.txt
> index 7e7dbd62307b..91c1f14ce1a2 100644
> --- a/src/gallium/drivers/panfrost/ci/expected-failures.txt
> +++ b/src/gallium/drivers/panfrost/ci/expected-failures.txt
> @@ -1,10 +1,3 @@
> -dEQP-GLES2.functional.color_clear.masked_rgba Fail
> -dEQP-GLES2.functional.color_clear.masked_rgb Fail
> -dEQP-GLES2.functional.color_clear.masked_scissored_rgba Fail
> -dEQP-GLES2.functional.color_clear.masked_scissored_rgb Fail
> -dEQP-GLES2.functional.color_clear.scissored_rgba Fail
> -dEQP-GLES2.functional.color_clear.scissored_rgb Fail
> -dEQP-GLES2.functional.color_clear.short_scissored_rgb Fail
>  dEQP-GLES2.functional.depth_range.write.0_8_to_third Fail
>  dEQP-GLES2.functional.depth_range.write.clamp_both Fail
>  dEQP-GLES2.functional.depth_range.write.clamp_far Fail
> @@ -672,201 +665,55 @@ 
> dEQP-GLES2.functional.fragment_ops.depth_stencil.stencil_ops.zero_zero_zero 
> Fail
>  dEQP-GLES2.functional.fragment_ops.depth_stencil.write_mask.both Fail
>  dEQP-GLES2.functional.fragment_ops.depth_stencil.write_mask.depth Fail
>  dEQP-GLES2.functional.fragment_ops.depth_stencil.write_mask.stencil Fail
> -dEQP-GLES2.functional.fragment_ops.interaction.basic_shader.0 Fail
> -dEQP-GLES2.functional.fragment_ops.interaction.basic_shader.10 Fail
>  dEQP-GLES2.functional.fragment_ops.interaction.basic_shader.11 Fail
> -dEQP-GLES2.functional.fragment_ops.interaction.basic_shader.12 Fail
>  dEQP-GLES2.functional.fragment_ops.interaction.basic_shader.13 Fail
>  dEQP-GLES2.functional.fragment_ops.interaction.basic_shader.15 Fail
> -dEQP-GLES2.functional.fragment_ops.interaction.basic_shader.16 Fail
>  dEQP-GLES2.functional.fragment_ops.interaction.basic_shader.17 Fail
>  dEQP-GLES2.functional.fragment_ops.interaction.basic_shader.18 Fail
>  dEQP-GLES2.functional.fragment_ops.interaction.basic_shader.19 Fail
> -dEQP-GLES2.functional.fragment_ops.interaction.basic_shader.1 Fail
>  dEQP-GLES2.functional.fragment_ops.interaction.basic_shader.20 Fail
> -dEQP-GLES2.functional.fragment_ops.interaction.basic_shader.21 Fail
>  dEQP-GLES2.functional.fragment_ops.interaction.basic_shader.22 Fail
> -dEQP-GLES2.functional.fragment_ops.interaction.basic_shader.23 Fail
> -dEQP-GLES2.functional.fragment_ops.interaction.basic_shader.24 Fail
> -dEQP-GLES2.functional.fragment_ops.interaction.basic_shader.25 Fail
>  dEQP-GLES2.functional.fragment_ops.interaction.basic_shader.26 Fail
> -dEQP-GLES2.functional.fragment_ops.interaction.basic_shader.29 Fail
> -dEQP-GLES2.functional.fragment_ops.interaction.basic_shader.30 Fail
> -dEQP-GLES2.functional.fragment_ops.interaction.basic_shader.31 Fail
> -dEQP-GLES2.functional.fragment_ops.interaction.basic_shader.32 Fail
> -dEQP-GLES2.functional.fragment_ops.interaction.basic_shader.33 Fail
> -dEQP-GLES2.functional.fragment_ops.interaction.basic_shader.34 Fail
> -dEQP-GLES2.functional.fragment_ops.interaction.basic_shader.35 Fail
> -dEQP-GLES2.functional.fragment_ops.interaction.basic_shader.36 Fail
> -dEQP-GLES2.functional.fragment_ops.interaction.basic_shader.37 Fail
> -dEQP-GLES2.functional.fragment_ops.interaction.basic_shader.38 Fail
>  dEQP-GLES2.functional.fragment_ops.interaction.basic_shader.39 Fail
> -dEQP-GLES2.functional.fragment_ops.interaction.basic_shader.3 Fail
> -dEQP-GLES2.functional.fragment_ops.interaction.basic_shader.40 Fail
> -dEQP-GLES2.functional.fragment_ops.interaction.basic_shader.41 Fail
>  dEQP-GLES2.functional.fragment_ops.interaction.basic_shader.42 Fail
> -dEQP-GLES2.functional.fragment_ops.interaction.basic_shader.43 Fail
>  dEQP-GLES2.functional.fragment_ops.interaction.basic_shader.44 Fail
> -dEQP-GLES2.functional.fragment_ops.interaction.basic_shader.46 Fail
>  dEQP-GLES2.functional.fragment_ops.interaction.basic_shader.47 Fail
>  dEQP-GLES2.functional.fragment_ops.interaction.basic_shader.48 Fail
> -dEQP-GLES2.functional.fragment_ops.interaction.basic_shader.49 Fail
> -dEQP-GLES2.functional.fragment_ops.interaction.basic_shader.50 Fail
> -dEQP-GLES2.functional.fragment_ops.interaction.basic_shader.51 Fail
> -dEQP-GLES2.functional.fragment_ops.interaction.basic_shader.52 Fail
> -dEQP-GLES2.functional.fragment_ops.interaction.basic_shader.53 Fail
> -dEQP-GLES2.functional.fragment_ops.interaction.basic_shader.54 Fail
> -dEQP-GLES2.functional.fragment_ops.interaction.basic_shader.55 Fail
> -dEQP-GLES2.functional.fragment_ops.interaction.basic_shader.56 Fail
>  dEQP-GLES2.functional.fragment_ops.interaction.basic_shader.57 Fail
>

Re: [Mesa-dev] [PATCH 2/3] panfrost: Draw the wallpaper when only depth/stencil bufs are cleared

R-b, nice fix :)

On Fri, Sep 20, 2019 at 04:53:38PM +0200, Boris Brezillon wrote:
> When only the depth/stencil bufs are cleared, we should make sure the
> color content is reloaded into the tile buffers if we want to preserve
> their content.
> 
> Signed-off-by: Boris Brezillon 
> ---
> There might be a more optimal solution to do that (like not passing the
> color bufs to the fragment job?), but this solution seems to fix a few
> deqp tests.
> ---
>  src/gallium/drivers/panfrost/pan_context.c |  2 +-
>  src/gallium/drivers/panfrost/pan_job.c | 16 ++--
>  2 files changed, 15 insertions(+), 3 deletions(-)
> 
> diff --git a/src/gallium/drivers/panfrost/pan_context.c 
> b/src/gallium/drivers/panfrost/pan_context.c
> index b2f2a9da7a51..c99bf1b26ce7 100644
> --- a/src/gallium/drivers/panfrost/pan_context.c
> +++ b/src/gallium/drivers/panfrost/pan_context.c
> @@ -1333,7 +1333,7 @@ panfrost_queue_draw(struct panfrost_context *ctx)
>  
>  if (rasterizer_discard)
>  panfrost_scoreboard_queue_vertex_job(batch, vertex, FALSE);
> -else if (ctx->wallpaper_batch)
> +else if (ctx->wallpaper_batch && batch->first_tiler.gpu)
>  panfrost_scoreboard_queue_fused_job_prepend(batch, vertex, 
> tiler);
>  else
>  panfrost_scoreboard_queue_fused_job(batch, vertex, tiler);
> diff --git a/src/gallium/drivers/panfrost/pan_job.c 
> b/src/gallium/drivers/panfrost/pan_job.c
> index 4ec2aa0565d7..a2df31f96f00 100644
> --- a/src/gallium/drivers/panfrost/pan_job.c
> +++ b/src/gallium/drivers/panfrost/pan_job.c
> @@ -698,10 +698,23 @@ panfrost_batch_get_tiler_dummy(struct panfrost_batch 
> *batch)
>  static void
>  panfrost_batch_draw_wallpaper(struct panfrost_batch *batch)
>  {
> +/* Color 0 is cleared, no need to draw the wallpaper.
> + * TODO: MRT wallpapers.
> + */
> +if (batch->clear & PIPE_CLEAR_COLOR0)
> +return;
> +
>  /* Nothing to reload? TODO: MRT wallpapers */
>  if (batch->key.cbufs[0] == NULL)
>  return;
>  
> +/* No draw calls, and no clear on the depth/stencil bufs.
> + * Drawing the wallpaper would be useless.
> + */
> +if (!batch->last_tiler.gpu &&
> +!(batch->clear & PIPE_CLEAR_DEPTHSTENCIL))
> +return;
> +
>  /* Check if the buffer has any content on it worth preserving */
>  
>  struct pipe_surface *surf = batch->key.cbufs[0];
> @@ -923,8 +936,7 @@ panfrost_batch_submit(struct panfrost_batch *batch)
>  goto out;
>  }
>  
> -if (!batch->clear && batch->last_tiler.gpu)
> -panfrost_batch_draw_wallpaper(batch);
> +panfrost_batch_draw_wallpaper(batch);
>  
>  panfrost_scoreboard_link_batch(batch);
>  
> -- 
> 2.21.0
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 1/3] panfrost: Make sure a clear does not re-use a pre-existing batch

To be clear, if we have a batch and do the following operations:

clear red
draw 1
clear green
draw 2
flush

All we should see is #2 on a green background, which this patch handles
by the second clear invalidating all the clears/draws that came before
it (provided there is no flush in between). 

I might just be tripped up by the "freeze" name. That really means throw
away / free here, I guess?

Provided that's the idea (and we're not somehow saving the original draw
1), it's Reviewed-by A R 

On Fri, Sep 20, 2019 at 04:53:37PM +0200, Boris Brezillon wrote:
> glClear()s are expected to be the first thing GL apps do before drawing
> new things. If there's already an existing batch targetting the same
> FBO that has draws attached to it, we should make sure the new clear
> gets a new batch assigned to guaranteed that the FB content is actually
> cleared with the requested color/depth/stencil values.
> 
> We create a panfrost_get_fresh_batch_for_fbo() helper for that and
> call it from panfrost_clear().
> 
> Signed-off-by: Boris Brezillon 
> ---
>  src/gallium/drivers/panfrost/pan_context.c |  2 +-
>  src/gallium/drivers/panfrost/pan_job.c | 21 +
>  src/gallium/drivers/panfrost/pan_job.h |  3 +++
>  3 files changed, 25 insertions(+), 1 deletion(-)
> 
> diff --git a/src/gallium/drivers/panfrost/pan_context.c 
> b/src/gallium/drivers/panfrost/pan_context.c
> index ac01461a07fe..b2f2a9da7a51 100644
> --- a/src/gallium/drivers/panfrost/pan_context.c
> +++ b/src/gallium/drivers/panfrost/pan_context.c
> @@ -162,7 +162,7 @@ panfrost_clear(
>  double depth, unsigned stencil)
>  {
>  struct panfrost_context *ctx = pan_context(pipe);
> -struct panfrost_batch *batch = panfrost_get_batch_for_fbo(ctx);
> +struct panfrost_batch *batch = panfrost_get_fresh_batch_for_fbo(ctx);
>  
>  panfrost_batch_add_fbo_bos(batch);
>  panfrost_batch_clear(batch, buffers, color, depth, stencil);
> diff --git a/src/gallium/drivers/panfrost/pan_job.c 
> b/src/gallium/drivers/panfrost/pan_job.c
> index d8330bc133a6..4ec2aa0565d7 100644
> --- a/src/gallium/drivers/panfrost/pan_job.c
> +++ b/src/gallium/drivers/panfrost/pan_job.c
> @@ -298,6 +298,27 @@ panfrost_get_batch_for_fbo(struct panfrost_context *ctx)
>  return batch;
>  }
>  
> +struct panfrost_batch *
> +panfrost_get_fresh_batch_for_fbo(struct panfrost_context *ctx)
> +{
> +struct panfrost_batch *batch;
> +
> +batch = panfrost_get_batch(ctx, >pipe_framebuffer);
> +
> +/* The batch has no draw/clear queued, let's return it directly.
> + * Note that it's perfectly fine to re-use a batch with an
> + * existing clear, we'll just update it with the new clear request.
> + */
> +if (!batch->last_job.gpu)
> +return batch;
> +
> +/* Otherwise, we need to freeze the existing one and instantiate a 
> new
> + * one.
> + */
> +panfrost_freeze_batch(batch);
> +return panfrost_get_batch(ctx, >pipe_framebuffer);
> +}
> +
>  static bool
>  panfrost_batch_fence_is_signaled(struct panfrost_batch_fence *fence)
>  {
> diff --git a/src/gallium/drivers/panfrost/pan_job.h 
> b/src/gallium/drivers/panfrost/pan_job.h
> index e1b1f56a2e64..0bd78bba267a 100644
> --- a/src/gallium/drivers/panfrost/pan_job.h
> +++ b/src/gallium/drivers/panfrost/pan_job.h
> @@ -172,6 +172,9 @@ panfrost_batch_fence_reference(struct 
> panfrost_batch_fence *batch);
>  struct panfrost_batch *
>  panfrost_get_batch_for_fbo(struct panfrost_context *ctx);
>  
> +struct panfrost_batch *
> +panfrost_get_fresh_batch_for_fbo(struct panfrost_context *ctx);
> +
>  void
>  panfrost_batch_init(struct panfrost_context *ctx);
>  
> -- 
> 2.21.0
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 1/2] panfrost: Fix indexed draws

2019-09-18 Thread Alyssa Rosenzweig

R-b

On Wed, Sep 18, 2019 at 03:54:37PM +0200, Boris Brezillon wrote:
> ->padded_count should be large enough to cover all vertices pointed by
> the index array. Use the local vertex_count variable that contains the
> updated vertex_count value for the indexed draw case.
> 
> Signed-off-by: Boris Brezillon 
> ---
>  src/gallium/drivers/panfrost/pan_context.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/src/gallium/drivers/panfrost/pan_context.c 
> b/src/gallium/drivers/panfrost/pan_context.c
> index 08b799b66bf8..1b8558c1c2c1 100644
> --- a/src/gallium/drivers/panfrost/pan_context.c
> +++ b/src/gallium/drivers/panfrost/pan_context.c
> @@ -1601,7 +1601,7 @@ panfrost_draw_vbo(
>  
>  ctx->padded_count = pan_expand_shift_odd(so);
>  } else {
> -ctx->padded_count = ctx->vertex_count;
> +ctx->padded_count = vertex_count;
>  
>  /* Reset instancing state */
>  ctx->payloads[PIPE_SHADER_VERTEX].instance_shift = 0;
> -- 
> 2.21.0
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v2 00/37] panfrost: Support batch pipelining

"Can't use pipe_framebuffer_state as a hash key" Is this still relevant?
I thought we did this.

You're right that the 32-batch bitset is something we can integrate in a
v2.

I guess that's it, never mind. Good stuff regardless :)

On Tue, Sep 17, 2019 at 12:08:34AM +0200, Boris Brezillon wrote:
> On Mon, 16 Sep 2019 16:29:05 -0400
> Alyssa Rosenzweig  wrote:
> 
> > > As a drive-by comment, in case you didn't know, the "standard"
> > > solution for avoiding flushing when BO's are written by the CPU (e.g.
> > > uniform buffer updates) as documented in ARM's performance guide is to
> > > add a copy-on-write mechanism, so that you have "shadow" BO's when the
> > > original BO is modified by the user. I believe this is implemented in
> > > freedreno, at least there was a talk about it at XDC a few years ago.  
> > 
> > Yes, this is implemented in freedreno. BO shadowing will be the next
> > step once this pipelining code settles down. For now, this series is
> > about eliminating the strict flushes between each and every frame of
> > each and every FBO that we currently occur now.
> > 
> > Boris, references on the freedreno model (which is the mesa gold
> > standard):
> > 
> > https://www.x.org/wiki/Events/XDC2016/Program/clark_ooo_rendering.pdf
> > https://bloggingthemonkey.blogspot.com/2016/07/dirty-tricks-for-moar-fps.html
> > 
> > The former presentation is definitely worth a read; evidently we've
> > already painted ourselves into some corners :p
> 
> I had a quick look at the presentation, and it looks pretty similar to
> what is being implemented in this series. The only difference I could
> spot is the limitation to 32 batches to avoid usage of sets in the BO
> access tracking logic, and that's still something I can change (I'd
> prefer to do that in a follow-up patch though).
> 
> What specific aspects do you think we got wrong?
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v2 37/37] panfrost/ci: New tests are passing

R-b, nice :)

On Mon, Sep 16, 2019 at 11:37:15AM +0200, Boris Brezillon wrote:
> All dEQP-GLES2.functional.fbo.render.texsubimage.* tests are now
> passing.
> 
> Signed-off-by: Boris Brezillon 
> ---
>  src/gallium/drivers/panfrost/ci/expected-failures.txt | 4 
>  1 file changed, 4 deletions(-)
> 
> diff --git a/src/gallium/drivers/panfrost/ci/expected-failures.txt 
> b/src/gallium/drivers/panfrost/ci/expected-failures.txt
> index b0fc872a3009..3c707230dd23 100644
> --- a/src/gallium/drivers/panfrost/ci/expected-failures.txt
> +++ b/src/gallium/drivers/panfrost/ci/expected-failures.txt
> @@ -53,10 +53,6 @@ 
> dEQP-GLES2.functional.fbo.render.shared_colorbuffer.tex2d_rgb_depth_component16
>  
> dEQP-GLES2.functional.fbo.render.shared_depthbuffer.rbo_rgb565_depth_component16
>  Fail
>  
> dEQP-GLES2.functional.fbo.render.shared_depthbuffer.tex2d_rgba_depth_component16
>  Fail
>  
> dEQP-GLES2.functional.fbo.render.shared_depthbuffer.tex2d_rgb_depth_component16
>  Fail
> -dEQP-GLES2.functional.fbo.render.texsubimage.after_render_tex2d_rgba Fail
> -dEQP-GLES2.functional.fbo.render.texsubimage.after_render_tex2d_rgb Fail
> -dEQP-GLES2.functional.fbo.render.texsubimage.between_render_tex2d_rgba Fail
> -dEQP-GLES2.functional.fbo.render.texsubimage.between_render_tex2d_rgb Fail
>  dEQP-GLES2.functional.fragment_ops.depth_stencil.random.0 Fail
>  dEQP-GLES2.functional.fragment_ops.depth_stencil.random.10 Fail
>  dEQP-GLES2.functional.fragment_ops.depth_stencil.random.11 Fail
> -- 
> 2.21.0
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v2 36/37] panfrost: Take draw call order into account

Hmm, could you explain a bit why this is necessary?

I would think if there's no dependency, there's no dependency, and if
this fixes bugs, that's a dependency tracking issue that we're just
papering over.

(Also, I guess r-b on previous patch retracted temporarily since it was a setup 
for
this.)

On Mon, Sep 16, 2019 at 11:37:14AM +0200, Boris Brezillon wrote:
> This is not strictly required, but let's try to match the draw call
> orders, just in case the app had a reason to do it in this order.
> 
> Signed-off-by: Boris Brezillon 
> ---
>  src/gallium/drivers/panfrost/pan_context.h |  6 ++
>  src/gallium/drivers/panfrost/pan_job.c | 23 +++---
>  src/gallium/drivers/panfrost/pan_job.h |  3 +++
>  3 files changed, 29 insertions(+), 3 deletions(-)
> 
> diff --git a/src/gallium/drivers/panfrost/pan_context.h 
> b/src/gallium/drivers/panfrost/pan_context.h
> index f13967f51b46..c6b53685b285 100644
> --- a/src/gallium/drivers/panfrost/pan_context.h
> +++ b/src/gallium/drivers/panfrost/pan_context.h
> @@ -114,6 +114,12 @@ struct panfrost_context {
>  struct panfrost_batch *batch;
>  struct hash_table *fbo_to_batch;
>  
> +/* A list containing all non-submitted batches since the last flush.
> + * This list is used to keep track of clear/draw order on batches 
> that
> + * don't have explicit dependencies between them.
> + */
> +struct list_head batch_queue;
> +
>  /* panfrost_bo -> panfrost_bo_access */
>  struct hash_table *accessed_bos;
>  
> diff --git a/src/gallium/drivers/panfrost/pan_job.c 
> b/src/gallium/drivers/panfrost/pan_job.c
> index 13d7e8086e62..e030f8e98dad 100644
> --- a/src/gallium/drivers/panfrost/pan_job.c
> +++ b/src/gallium/drivers/panfrost/pan_job.c
> @@ -118,6 +118,7 @@ panfrost_create_batch(struct panfrost_context *ctx,
>  util_dynarray_init(>headers, batch);
>  util_dynarray_init(>gpu_headers, batch);
>  util_dynarray_init(>dependencies, batch);
> +list_inithead(>queue_node);
>  batch->out_sync = panfrost_create_batch_fence(batch);
>  util_copy_framebuffer_state(>key, key);
>  
> @@ -181,6 +182,9 @@ panfrost_free_batch(struct panfrost_batch *batch)
>struct panfrost_batch_fence *, dep)
>  panfrost_batch_fence_unreference(*dep);
>  
> +/* Remove the batch from the batch queue. */
> +list_del(>queue_node);
> +
>  /* The out_sync fence lifetime is different from the the batch one
>   * since other batches might want to wait on an fence of already
>   * submitted/signaled batch. All we need to do here is make sure the
> @@ -543,6 +547,13 @@ void panfrost_batch_add_fbo_bos(struct panfrost_batch 
> *batch)
>  struct panfrost_resource *rsrc = 
> pan_resource(batch->key.zsbuf->texture);
>  panfrost_batch_add_bo(batch, rsrc->bo, flags);
>  }
> +
> +/* We the batch was not already present in the queue, add it know.
> + * Should we move the batch at end of the queue when a new draw
> + * happens?
> + */
> +if (list_empty(>queue_node))
> +list_addtail(>queue_node, >ctx->batch_queue);
>  }
>  
>  struct panfrost_bo *
> @@ -878,10 +889,15 @@ panfrost_flush_all_batches(struct panfrost_context 
> *ctx, bool wait)
>  util_dynarray_init(, NULL);
>  }
>  
> -hash_table_foreach(ctx->fbo_to_batch, hentry) {
> -struct panfrost_batch *batch = hentry->data;
> +/* We can use the for_each_entry_safe() iterator here because the
> + * next element might be removed from the list when flushing the
> + * dependencies in panfrost_batch_submit().
> + */
> +while (!list_empty(>batch_queue)) {
> +struct panfrost_batch *batch;
>  
> -assert(batch);
> +batch = list_first_entry(>batch_queue,
> + struct panfrost_batch, queue_node);
>  
>  if (wait) {
>  panfrost_batch_fence_reference(batch->out_sync);
> @@ -1150,4 +1166,5 @@ panfrost_batch_init(struct panfrost_context *ctx)
>  panfrost_batch_compare);
>  ctx->accessed_bos = _mesa_hash_table_create(ctx, _mesa_hash_pointer,
>  _mesa_key_pointer_equal);
> +list_inithead(>batch_queue);
>  }
> diff --git a/src/gallium/drivers/panfrost/pan_job.h 
> b/src/gallium/drivers/panfrost/pan_job.h
> index d198864ce4f7..34926e30cdde 100644
> --- a/src/gallium/drivers/panfrost/pan_job.h
> +++ b/src/gallium/drivers/panfrost/pan_job.h
> @@ -71,6 +71,9 @@ struct panfrost_batch {
>  struct panfrost_context *ctx;
>  struct pipe_framebuffer_state key;
>  
> +/* Used to insert the batch in the batch queue */
> +

Re: [Mesa-dev] [PATCH v2 35/37] panfrost: Rename ctx->batches into ctx->fbo_to_batch

R-b

On Mon, Sep 16, 2019 at 11:37:13AM +0200, Boris Brezillon wrote:
> We are about to add a batch queue to keep track of submission order.
> Let's rename the existing batches hash table (which is used to get the
> batch attached to an FBO) into fbo_to_batch to avoid confusion.
> 
> Signed-off-by: Boris Brezillon 
> ---
>  src/gallium/drivers/panfrost/pan_context.c |  2 +-
>  src/gallium/drivers/panfrost/pan_context.h |  2 +-
>  src/gallium/drivers/panfrost/pan_job.c | 21 +++--
>  3 files changed, 13 insertions(+), 12 deletions(-)
> 
> diff --git a/src/gallium/drivers/panfrost/pan_context.c 
> b/src/gallium/drivers/panfrost/pan_context.c
> index 90c7512b105f..b8f653bf8e72 100644
> --- a/src/gallium/drivers/panfrost/pan_context.c
> +++ b/src/gallium/drivers/panfrost/pan_context.c
> @@ -1361,7 +1361,7 @@ panfrost_flush(
>   */
>  if (fence) {
>  util_dynarray_init(, NULL);
> -hash_table_foreach(ctx->batches, hentry) {
> +hash_table_foreach(ctx->fbo_to_batch, hentry) {
>  struct panfrost_batch *batch = hentry->data;
>  
>  panfrost_batch_fence_reference(batch->out_sync);
> diff --git a/src/gallium/drivers/panfrost/pan_context.h 
> b/src/gallium/drivers/panfrost/pan_context.h
> index d50ed57d5d8a..f13967f51b46 100644
> --- a/src/gallium/drivers/panfrost/pan_context.h
> +++ b/src/gallium/drivers/panfrost/pan_context.h
> @@ -112,7 +112,7 @@ struct panfrost_context {
>  
>  /* Bound job batch and map of panfrost_batch_key to job batches */
>  struct panfrost_batch *batch;
> -struct hash_table *batches;
> +struct hash_table *fbo_to_batch;
>  
>  /* panfrost_bo -> panfrost_bo_access */
>  struct hash_table *accessed_bos;
> diff --git a/src/gallium/drivers/panfrost/pan_job.c 
> b/src/gallium/drivers/panfrost/pan_job.c
> index df92b791a1f2..13d7e8086e62 100644
> --- a/src/gallium/drivers/panfrost/pan_job.c
> +++ b/src/gallium/drivers/panfrost/pan_job.c
> @@ -134,9 +134,9 @@ panfrost_freeze_batch(struct panfrost_batch *batch)
>   * matches. This way, next draws/clears targeting this FBO will 
> trigger
>   * the creation of a new batch.
>   */
> -entry = _mesa_hash_table_search(ctx->batches, >key);
> +entry = _mesa_hash_table_search(ctx->fbo_to_batch, >key);
>   if (entry && entry->data == batch)
> -_mesa_hash_table_remove(ctx->batches, entry);
> +_mesa_hash_table_remove(ctx->fbo_to_batch, entry);
>  
>  /* If this is the bound batch, the panfrost_context parameters are
>   * relevant so submitting it invalidates those parameters, but if 
> it's
> @@ -155,7 +155,7 @@ static bool panfrost_batch_is_frozen(struct 
> panfrost_batch *batch)
>  struct panfrost_context *ctx = batch->ctx;
>  struct hash_entry *entry;
>  
> -entry = _mesa_hash_table_search(ctx->batches, >key);
> +entry = _mesa_hash_table_search(ctx->fbo_to_batch, >key);
>  if (entry && entry->data == batch)
>  return false;
>  
> @@ -245,7 +245,8 @@ panfrost_get_batch(struct panfrost_context *ctx,
> const struct pipe_framebuffer_state *key)
>  {
>  /* Lookup the job first */
> -struct hash_entry *entry = _mesa_hash_table_search(ctx->batches, 
> key);
> +struct hash_entry *entry = _mesa_hash_table_search(ctx->fbo_to_batch,
> +   key);
>  
>  if (entry)
>  return entry->data;
> @@ -255,7 +256,7 @@ panfrost_get_batch(struct panfrost_context *ctx,
>  struct panfrost_batch *batch = panfrost_create_batch(ctx, key);
>  
>  /* Save the created job */
> -_mesa_hash_table_insert(ctx->batches, >key, batch);
> +_mesa_hash_table_insert(ctx->fbo_to_batch, >key, batch);
>  
>  return batch;
>  }
> @@ -877,7 +878,7 @@ panfrost_flush_all_batches(struct panfrost_context *ctx, 
> bool wait)
>  util_dynarray_init(, NULL);
>  }
>  
> -hash_table_foreach(ctx->batches, hentry) {
> +hash_table_foreach(ctx->fbo_to_batch, hentry) {
>  struct panfrost_batch *batch = hentry->data;
>  
>  assert(batch);
> @@ -892,7 +893,7 @@ panfrost_flush_all_batches(struct panfrost_context *ctx, 
> bool wait)
>  panfrost_batch_submit(batch);
>  }
>  
> -assert(!ctx->batches->entries);
> +assert(!ctx->fbo_to_batch->entries);
>  
>  /* Collect batch fences before returning */
>  panfrost_gc_fences(ctx);
> @@ -1144,9 +1145,9 @@ panfrost_batch_is_scanout(struct panfrost_batch *batch)
>  void
>  panfrost_batch_init(struct panfrost_context *ctx)
>  {
> -ctx->batches = _mesa_hash_table_create(ctx,
> -   panfrost_batch_hash,
> -

Re: [Mesa-dev] [PATCH v2 34/37] panfrost: Do fine-grained flushing when preparing BO for CPU accesses

R-b

On Mon, Sep 16, 2019 at 11:37:12AM +0200, Boris Brezillon wrote:
> We don't have to flush all batches when we're only interested in
> reading/writing a specific BO. Thanks to the
> panfrost_flush_batches_accessing_bo() and panfrost_bo_wait() helpers
> we can now flush only the batches touching the BO we want to access
> from the CPU.
> 
> Signed-off-by: Boris Brezillon 
> ---
>  src/gallium/drivers/panfrost/pan_resource.c | 27 +
>  1 file changed, 12 insertions(+), 15 deletions(-)
> 
> diff --git a/src/gallium/drivers/panfrost/pan_resource.c 
> b/src/gallium/drivers/panfrost/pan_resource.c
> index 1f7605adcd5d..d59529ff15b7 100644
> --- a/src/gallium/drivers/panfrost/pan_resource.c
> +++ b/src/gallium/drivers/panfrost/pan_resource.c
> @@ -578,10 +578,8 @@ panfrost_transfer_map(struct pipe_context *pctx,
>  is_bound |= fb->cbufs[c]->texture == resource;
>  }
>  
> -if (is_bound && (usage & PIPE_TRANSFER_READ)) {
> -assert(level == 0);
> -panfrost_flush_all_batches(ctx, true);
> -}
> +if (is_bound && (usage & PIPE_TRANSFER_READ))
> + assert(level == 0);
>  
>  /* TODO: Respect usage flags */
>  
> @@ -594,11 +592,11 @@ panfrost_transfer_map(struct pipe_context *pctx,
>  /* No flush for writes to uninitialized */
>  } else if (!(usage & PIPE_TRANSFER_UNSYNCHRONIZED)) {
>  if (usage & PIPE_TRANSFER_WRITE) {
> -/* STUB: flush reading */
> -//printf("debug: missed reading flush %d\n", 
> resource->target);
> +panfrost_flush_batches_accessing_bo(ctx, bo, 
> PAN_BO_GPU_ACCESS_RW);
> +panfrost_bo_wait(bo, INT64_MAX, 
> PAN_BO_GPU_ACCESS_RW);
>  } else if (usage & PIPE_TRANSFER_READ) {
> -/* STUB: flush writing */
> -//printf("debug: missed writing flush %d (%d-%d)\n", 
> resource->target, box->x, box->x + box->width);
> +panfrost_flush_batches_accessing_bo(ctx, bo, 
> PAN_BO_GPU_ACCESS_WRITE);
> +panfrost_bo_wait(bo, INT64_MAX, 
> PAN_BO_GPU_ACCESS_WRITE);
>  } else {
>  /* Why are you even mapping?! */
>  }
> @@ -748,11 +746,8 @@ panfrost_generate_mipmap(
>   * reorder-type optimizations in place. But for now prioritize
>   * correctness. */
>  
> -struct panfrost_batch *batch = panfrost_get_batch_for_fbo(ctx);
> -bool has_draws = batch->last_job.gpu;
> -
> -if (has_draws)
> -panfrost_flush_all_batches(ctx, true);
> +panfrost_flush_batches_accessing_bo(ctx, rsrc->bo, 
> PAN_BO_GPU_ACCESS_RW);
> +panfrost_bo_wait(rsrc->bo, INT64_MAX, PAN_BO_GPU_ACCESS_RW);
>  
>  /* We've flushed the original buffer if needed, now trigger a blit */
>  
> @@ -765,8 +760,10 @@ panfrost_generate_mipmap(
>  /* If the blit was successful, flush once more. If it wasn't, well, 
> let
>   * the state tracker deal with it. */
>  
> -if (blit_res)
> -panfrost_flush_all_batches(ctx, true);
> +if (blit_res) {
> +panfrost_flush_batches_accessing_bo(ctx, rsrc->bo, 
> PAN_BO_GPU_ACCESS_WRITE);
> +panfrost_bo_wait(rsrc->bo, INT64_MAX, 
> PAN_BO_GPU_ACCESS_WRITE);
> +}
>  
>  return blit_res;
>  }
> -- 
> 2.21.0
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v2 33/37] panfrost: Get rid of the flush in panfrost_set_framebuffer_state()

R-b :D

On Mon, Sep 16, 2019 at 11:37:11AM +0200, Boris Brezillon wrote:
> Now that we have track inter-batch dependencies, the flush done in
> panfrost_set_framebuffer_state() is no longer needed. Let's get rid of
> it.
> 
> Signed-off-by: Boris Brezillon 
> ---
>  src/gallium/drivers/panfrost/pan_context.c | 46 ++
>  1 file changed, 3 insertions(+), 43 deletions(-)
> 
> diff --git a/src/gallium/drivers/panfrost/pan_context.c 
> b/src/gallium/drivers/panfrost/pan_context.c
> index 56a76a230141..90c7512b105f 100644
> --- a/src/gallium/drivers/panfrost/pan_context.c
> +++ b/src/gallium/drivers/panfrost/pan_context.c
> @@ -2306,50 +2306,10 @@ panfrost_set_framebuffer_state(struct pipe_context 
> *pctx,
>  {
>  struct panfrost_context *ctx = pan_context(pctx);
>  
> -/* Flush when switching framebuffers, but not if the framebuffer
> - * state is being restored by u_blitter
> - */
> -
> -struct panfrost_batch *batch = panfrost_get_batch_for_fbo(ctx);
> -bool is_scanout = panfrost_batch_is_scanout(batch);
> -bool has_draws = batch->last_job.gpu;
> -
> -/* Bail out early when the current and new states are the same. */
> -if (util_framebuffer_state_equal(>pipe_framebuffer, fb))
> -return;
> -
> -/* The wallpaper logic sets a new FB state before doing the blit and
> - * restore the old one when it's done. Those FB states are reported 
> to
> - * be different because the surface they are pointing to are 
> different,
> - * but those surfaces actually point to the same cbufs/zbufs. In that
> - * case we definitely don't want new FB descs to be emitted/attached
> - * since the job is expected to be flushed just after the blit is 
> done,
> - * so let's just copy the new state and return here.
> - */
> -if (ctx->wallpaper_batch) {
> -util_copy_framebuffer_state(>pipe_framebuffer, fb);
> -return;
> -}
> -
> -if (!is_scanout || has_draws)
> -panfrost_flush_all_batches(ctx, true);
> -else
> -
> assert(!ctx->payloads[PIPE_SHADER_VERTEX].postfix.framebuffer &&
> -   
> !ctx->payloads[PIPE_SHADER_FRAGMENT].postfix.framebuffer);
> -
> -/* Invalidate the FBO job cache since we've just been assigned a new
> - * FB state.
> - */
> -ctx->batch = NULL;
> -
> +panfrost_hint_afbc(pan_screen(pctx->screen), fb);
>  util_copy_framebuffer_state(>pipe_framebuffer, fb);
> -
> -/* Given that we're rendering, we'd love to have compression */
> -struct panfrost_screen *screen = pan_screen(ctx->base.screen);
> -
> -panfrost_hint_afbc(screen, >pipe_framebuffer);
> -for (unsigned i = 0; i < PIPE_SHADER_TYPES; ++i)
> -ctx->payloads[i].postfix.framebuffer = 0;
> +ctx->batch = NULL;
> +panfrost_invalidate_frame(ctx);
>  }
>  
>  static void *
> -- 
> 2.21.0
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v2 32/37] panfrost: Kill the explicit serialization in panfrost_batch_submit()

R-b with enthusiasm :)

On Mon, Sep 16, 2019 at 11:37:10AM +0200, Boris Brezillon wrote:
> Now that we have all the pieces in place to support pipelining batches
> we can get rid of the drmSyncobjWait() at the end of
> panfrost_batch_submit().
> 
> Signed-off-by: Boris Brezillon 
> ---
>  src/gallium/drivers/panfrost/pan_job.c | 9 -
>  1 file changed, 9 deletions(-)
> 
> diff --git a/src/gallium/drivers/panfrost/pan_job.c 
> b/src/gallium/drivers/panfrost/pan_job.c
> index de2922a8366e..df92b791a1f2 100644
> --- a/src/gallium/drivers/panfrost/pan_job.c
> +++ b/src/gallium/drivers/panfrost/pan_job.c
> @@ -846,7 +846,6 @@ panfrost_batch_submit(struct panfrost_batch *batch)
>  panfrost_batch_submit((*dep)->batch);
>  }
>  
> -struct panfrost_context *ctx = batch->ctx;
>  int ret;
>  
>  /* Nothing to do! */
> @@ -865,15 +864,7 @@ panfrost_batch_submit(struct panfrost_batch *batch)
>  
>  out:
>  panfrost_freeze_batch(batch);
> -
> -/* We always stall the pipeline for correct results since pipelined
> - * rendering is quite broken right now (to be fixed by the 
> panfrost_job
> - * refactor, just take the perf hit for correctness)
> - */
> -drmSyncobjWait(pan_screen(ctx->base.screen)->fd,
> -   >out_sync->syncobj, 1, INT64_MAX, 0, NULL);
>  panfrost_free_batch(batch);
> -
>  }
>  
>  void
> -- 
> 2.21.0
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v2 28/37] panfrost: Start tracking inter-batch dependencies

> > I'm wondering if batch->dependencies should be expressed as a set,
> > rather than a dynarray, such that testing whether a batch has a
> > given dependency is ideally O(1), not O(N).
> > 
> > In practice I don't know if the asymptotic complexity matters, esp. for
> > small numbers of batches, and in practice hash table iteration is slow
> > enough in mesa* that maybe it would be counterproductive.
> > 
> > Just something I thought I might throw out there.
> > 
> > * Set iteration ought to be no slower than array iteration, but constant
> >   factors are a thing.
> 
> I thought the number of deps would be small enough to not justify the
> use of a set, but maybe I'm wrong.

Mm, after all this lands we can profile and revisit, there are bigger
fish to fry.

> > > +static void
> > > +panfrost_bo_access_gc_fences(struct panfrost_context *ctx,
> > > + struct panfrost_bo_access *access,
> > > +  const struct panfrost_bo *bo)  
> > 
> > Could you remind me what gc abbreviates? Sorry.
> 
> Garbage collect.
> 
> > 
> > I'm a little unclear on what the function's purpose is based on the
> > name.
> 
> Collect signaled fences to keep the kernel-side syncobj-map small. The
> idea is to collect those signaled fences at the end of each flush_all
> call. This function is likely to collect only fences from previous
> batch flushes not the one that have just have just been submitted and
> are probably still in flight when we trigger the garbage collection.
> Anyway, we need to do this garbage collection at some point if we don't
> want the BO access map to keep invalid entries around and retain
> syncobjs forever.

This definitely needs to be documented somewhere, then. Maybe paste the
"Collect...forever" paragraph into a comment above access_gc_fences?

> > Question: is it safe to remove entries while iterating the table?
> > (Not a review comment, I don't know the details of mesa's
> > implementation)
> 
> According to the doc, it is.

Good to know, thank you.

> > > -if (ctx->last_out_sync) {
> > > +if (reqs & PANFROST_JD_REQ_FS && batch->first_job.gpu)  
> > 
> > Could we add a `bool is_fragment` with this condition and then make this
> > as well as the if below simple if (is_fragment)?
> 
> Hm, the condition is actually is_fragment_and_has_draws, so I'm not
> sure is_fragment is good name. Maybe depends_on_vertex_tiler.

is_fragment_shader?

> > > +/* Submit the dependencies first. */
> > > +util_dynarray_foreach(>dependencies,
> > > +  struct panfrost_batch_fence *, dep) {
> > > +if ((*dep)->batch)
> > > +panfrost_batch_submit((*dep)->batch);
> > > +}  
> > 
> > I assume this logic will be refined later in the series...?
> 
> It's not actually. Why do you think it should?

Oh, I was just assuming the dependency graph stuff would be more
explicit, I guess.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v2 31/37] panfrost: Add a panfrost_flush_batches_accessing_bo() helper

R-b

On Mon, Sep 16, 2019 at 11:37:09AM +0200, Boris Brezillon wrote:
> This will allow us to only flush batches touching a specific resource,
> which is particularly useful when the CPU needs to access a BO.
> 
> Signed-off-by: Boris Brezillon 
> ---
>  src/gallium/drivers/panfrost/pan_job.c | 31 ++
>  src/gallium/drivers/panfrost/pan_job.h |  4 
>  2 files changed, 35 insertions(+)
> 
> diff --git a/src/gallium/drivers/panfrost/pan_job.c 
> b/src/gallium/drivers/panfrost/pan_job.c
> index e36f252e01fc..de2922a8366e 100644
> --- a/src/gallium/drivers/panfrost/pan_job.c
> +++ b/src/gallium/drivers/panfrost/pan_job.c
> @@ -921,6 +921,37 @@ panfrost_flush_all_batches(struct panfrost_context *ctx, 
> bool wait)
>  util_dynarray_fini();
>  }
>  
> +void
> +panfrost_flush_batches_accessing_bo(struct panfrost_context *ctx,
> +struct panfrost_bo *bo,
> +uint32_t access_type)
> +{
> +struct panfrost_bo_access *access;
> +struct hash_entry *hentry;
> +
> +/* It doesn't make any to flush only the readers. */
> +assert(access_type == PAN_BO_GPU_ACCESS_WRITE ||
> +   access_type == PAN_BO_GPU_ACCESS_RW);
> +
> +hentry = _mesa_hash_table_search(ctx->accessed_bos, bo);
> +access = hentry ? hentry->data : NULL;
> +if (!access)
> +return;
> +
> +if (access_type & PAN_BO_GPU_ACCESS_WRITE && access->writer &&
> +access->writer->batch)
> +panfrost_batch_submit(access->writer->batch);
> +
> +if (!(access_type & PAN_BO_GPU_ACCESS_READ))
> +return;
> +
> +util_dynarray_foreach(>readers, struct panfrost_batch_fence 
> *,
> +  reader) {
> +if (*reader && (*reader)->batch)
> +panfrost_batch_submit((*reader)->batch);
> +}
> +}
> +
>  void
>  panfrost_batch_set_requirements(struct panfrost_batch *batch)
>  {
> diff --git a/src/gallium/drivers/panfrost/pan_job.h 
> b/src/gallium/drivers/panfrost/pan_job.h
> index 5c9d5e3715d5..d198864ce4f7 100644
> --- a/src/gallium/drivers/panfrost/pan_job.h
> +++ b/src/gallium/drivers/panfrost/pan_job.h
> @@ -185,6 +185,10 @@ panfrost_batch_create_bo(struct panfrost_batch *batch, 
> size_t size,
>  void
>  panfrost_flush_all_batches(struct panfrost_context *ctx, bool wait);
>  
> +void
> +panfrost_flush_batches_accessing_bo(struct panfrost_context *ctx,
> +struct panfrost_bo *bo, uint32_t flags);
> +
>  void
>  panfrost_batch_set_requirements(struct panfrost_batch *batch);
>  
> -- 
> 2.21.0
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v2 25/37] panfrost: Add a batch fence

> > > +/* Start in a signaled state so that even non-submitted batches
> > > + * (those that have no draw/clear) can be waited upon.
> > > + */  
> > 
> > When would this happen? Presumably if a batch does nothing whatsoever,
> > it doesn't make sense to wait on it.
> 
> Was just simpler to have all batch fences containing a syncobjs on
> which we can call WAIT_SYNCOBJ even no real fence is attached to it.
> 
> The other option being to set fence->signaled to true when a batch with
> no draw/clear is submitted, and then skip those entries when we build
> the array passed to the waitsyncobj() func.

I'm wondering the latter option would be cleaner, rather than having
extra DRM objects floating around?

> > >  #include "pan_resource.h"
> > >  
> > > +/* Batch that created this fence object. Will become NULL at 
> > > batch
> > > + * submission time. This field is mainly here to know whether the
> > > + * batch has been flushed or not.
> > > + */
> > > +struct panfrost_batch *batch;  
> > 
> > Could this be replaced by just a `bool flushed`, or do we actually use
> > the value in a later patch?
> 
> I actually use the value to flush deps when they're not already flushed.

Fair enough.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v2 23/37] panfrost: Make panfrost_batch->bos a hash table