avantgardnerio commented on issue #22723:
URL: https://github.com/apache/datafusion/issues/22723#issuecomment-4621928547
Extreme Programming breaks everything down into:
1. values: "we have limited resources, so we should manage them wisely"
2. principles: "so we use X test suite to find untracked memory"
3. practices: "so we can fix OOM bug #22739 "
IME, people almost always agree on 1 & 3. The principles are usually what
takes time to come agreement about, so it's good and natural to be discussing
what the optimal way to do this is.
So to avoid getting too abstract, I'll raise a specific grounded question
about not doing SLTs:
When our customer runs a query like:
```
source logs(<team>)
| filter <priority predicate>
&& \$d.message != null
| groupby \$d.message agg count(1) as \$d.occurrences
| orderby \$d.occurrences desc
| limit 15
```
If we have memory based SLTs in place, we can translate that into:
```
statement ok
CREATE TABLE utf8_keys AS
SELECT cast(v AS varchar) || repeat('x', 200) AS k,
make_array(1) AS _force_rows
FROM generate_series(1, 50000) AS t(v)
statement ok
SET datafusion.runtime.memory_limit = '1M'
query I nosort
SELECT count(*) FROM (
SELECT k, _force_rows, count(*) AS c
FROM utf8_keys
GROUP BY k, _force_rows
)
----
50000
```
And see an error like:
```
1. query failed: Other Error: allocator overdraft: account balance at panic
= -1384887 bytes
[SQL] SELECT count(*) FROM (
SELECT k, _force_rows, count(*) AS c
FROM utf8_keys
GROUP BY k, _force_rows
)
at
/__w/datafusion/datafusion/datafusion/sqllogictest/test_files/group_by_spill_row_decode.slt:49
```
Then make [a PR to fix the
error](https://github.com/apache/datafusion/pull/22741) , which stays in the
regression suite forever.
So my question is: if we adopt another approach - like
[SqlFuzz](https://github.com/andygrove/sqlfuzz), what would the workflow look
like to do the same?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]