gabrywu opened a new pull request, #6695:
URL: https://github.com/apache/kyuubi/pull/6695

   # :mag: Description
   ## Issue References ๐Ÿ”—
   
   This pull request closing #6691
   
   ## Describe Your Solution ๐Ÿ”ง
   
   There are many cases in which a SQL generate small files, we MUST merge them 
into bigger ones.
   I create a new Spark SQL command to merge small files, which doesn't 
read-write all of the records of a table, it just merges files in a binary 
level. Take a CSV table for example, it only appends the byte array from one 
file to another one, without reading & writing records
   
   Syntax here
   
   ```sparksql
   compact table table_name [INTO ${targetFileSize} ${targetFileSizeUnit} ] [ 
cleanup | retain | list ]
   -- targetFileSizeUnit can be 'b','k','m','g','t','p'
   -- cleanup means cleaning compact staging folders, which contains original 
small files, default behavior
   -- retain means retaining compact staging folders, for testing, and we can 
recover with the staging data
   -- list means this command only get the merging result, and don't run 
actually
   ```
   
   ```sparksql
   recover compact table table_name
   -- recover a table if compact table command fails
   ```
   
   ## Types of changes :bookmark:
   
   - [ ] Bugfix (non-breaking change which fixes an issue)
   - [x] New feature (non-breaking change which adds functionality)
   - [ ] Breaking change (fix or feature that would cause existing 
functionality to change)
   
   ## Test Plan ๐Ÿงช
   
   #### Behavior Without This Pull Request :coffin:
   
   
   #### Behavior With This Pull Request :tada:
   
   
   #### Related Unit Tests
   
   
   ---
   
   # Checklist ๐Ÿ“
   <!--- Go over all the following points, and put an `x` in all the boxes that 
apply. -->
   <!--- If you're unsure about any of these, don't hesitate to ask. We're here 
to help! -->
   
   - [ ] This patch was not authored or co-authored using [Generative 
Tooling](https://www.apache.org/legal/generative-tooling.html)
   
   **Be nice. Be informative.**
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscr...@kyuubi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscr...@kyuubi.apache.org
For additional commands, e-mail: notifications-h...@kyuubi.apache.org

Reply via email to