gabrywu opened a new pull request, #6695: URL: https://github.com/apache/kyuubi/pull/6695
# :mag: Description ## Issue References ๐ This pull request closing #6691 ## Describe Your Solution ๐ง There are many cases in which a SQL generate small files, we MUST merge them into bigger ones. I create a new Spark SQL command to merge small files, which doesn't read-write all of the records of a table, it just merges files in a binary level. Take a CSV table for example, it only appends the byte array from one file to another one, without reading & writing records Syntax here ```sparksql compact table table_name [INTO ${targetFileSize} ${targetFileSizeUnit} ] [ cleanup | retain | list ] -- targetFileSizeUnit can be 'b','k','m','g','t','p' -- cleanup means cleaning compact staging folders, which contains original small files, default behavior -- retain means retaining compact staging folders, for testing, and we can recover with the staging data -- list means this command only get the merging result, and don't run actually ``` ```sparksql recover compact table table_name -- recover a table if compact table command fails ``` ## Types of changes :bookmark: - [ ] Bugfix (non-breaking change which fixes an issue) - [x] New feature (non-breaking change which adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to change) ## Test Plan ๐งช #### Behavior Without This Pull Request :coffin: #### Behavior With This Pull Request :tada: #### Related Unit Tests --- # Checklist ๐ <!--- Go over all the following points, and put an `x` in all the boxes that apply. --> <!--- If you're unsure about any of these, don't hesitate to ask. We're here to help! --> - [ ] This patch was not authored or co-authored using [Generative Tooling](https://www.apache.org/legal/generative-tooling.html) **Be nice. Be informative.** -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: notifications-unsubscr...@kyuubi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: notifications-unsubscr...@kyuubi.apache.org For additional commands, e-mail: notifications-h...@kyuubi.apache.org