adriangb commented on code in PR #21633:
URL: https://github.com/apache/datafusion/pull/21633#discussion_r3095655683
##########
datafusion/physical-plan/src/spill/in_progress_spill_file.rs:
##########
@@ -51,16 +54,25 @@ impl InProgressSpillFile {
/// Appends a `RecordBatch` to the spill file, initializing the writer if
necessary.
///
+ /// Before writing, performs GC on StringView/BinaryView arrays to compact
backing
Review Comment:
Yes I think there's a larger discussion around the footgun of view arrays
sharing data and how slicing in general can result in fragmentation of data in
arrays and wasted memory. I don't know what the big picture story is but
personally I feel it would be reasonable to have some heuristic like "if an
array becomes more than 50% dead references gc it"
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]