cxzl25 opened a new pull request, #1816:
URL: https://github.com/apache/orc/pull/1816
### What changes were proposed in this pull request?
Add support for summarizing the number of files, file sizes and file lines
in the sizes command.
### Why are the changes needed?
When we count the size of each field, we only know the percentage and the
average size of each row, but we do not know the overall value.
### How was this patch tested?
local test
```bash
java -jar orc-tools-2.1.0-SNAPSHOT-uber.jar sizes -h
usage: sizes
-h,--help Print help message
-i,--ignoreExtension Ignore ORC file extension
-s,--summary Summarize the number of files, file size, and
number of file lines
```
```
java -jar orc-tools-2.1.0-SNAPSHOT-uber.jar sizes -s
```
```
Total Files: 5
Total Sizes: 4803687270
Total Rows: 39820045
Percent Bytes/Row Name
26.41 31.86
```
### Was this patch authored or co-authored using generative AI tooling?
No
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]