This is an automated email from the ASF dual-hosted git repository.
dongjoon pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/orc.git
The following commit(s) were added to refs/heads/main by this push:
new 3d382d872 ORC-1922: Support `Native Image` build for `orc-tools`
3d382d872 is described below
commit 3d382d872d21ad542a4574329ed0a10052400250
Author: Dongjoon Hyun <[email protected]>
AuthorDate: Mon Jun 16 17:06:23 2025 -0700
ORC-1922: Support `Native Image` build for `orc-tools`
### What changes were proposed in this pull request?
This PR aims to support `Native Image` build for `orc-tools`
### Why are the changes needed?
To minimize the binary size in terms of both `jar` and Docker images (which
we can remove Java installation additionally.)
**Uber jar (67MB)**
```
$ ls -alh tools/target/orc-tools-2.2.0-SNAPSHOT-uber.jar
-rw-r--r-- 1 dongjoon staff 67M Jun 16 13:22
tools/target/orc-tools-2.2.0-SNAPSHOT-uber.jar
```
**Native executable (56MB)**
```
$ ls -alh tools/target/orc-tool
-rwxr-xr-x 1 dongjoon staff 56M Jun 16 13:07 tools/target/orc-tool
```
### How was this patch tested?
Manually.
**GraalVM Community Edition**
```
$ java -version
openjdk version "21.0.2" 2024-01-16
OpenJDK Runtime Environment GraalVM CE 21.0.2+13.1 (build
21.0.2+13-jvmci-23.1-b30)
OpenJDK 64-Bit Server VM GraalVM CE 21.0.2+13.1 (build
21.0.2+13-jvmci-23.1-b30, mixed mode, sharing)
```
**Build `orc-tools` Native Image**
```
$ cd java
$ mvn clean package --pl tools --am -DskipTests -Pnative
...
========================================================================================================================
GraalVM Native Image: Generating 'orc-tool' (executable)...
========================================================================================================================
[1/8] Initializing...
(6.0s 0.29GB)
Java version: 21.0.2+13, vendor version: GraalVM CE 21.0.2+13.1
Graal compiler: optimization level: 2, target machine: armv8-a
C compiler: cc (apple, arm64, 17.0.0)
Garbage collector: Serial GC (max heap size: 80% of RAM)
1 user-specific feature(s):
- com.oracle.svm.thirdparty.gson.GsonFeature
------------------------------------------------------------------------------------------------------------------------
1 experimental option(s) unlocked:
- '-H:Name' (alternative API option(s): -o orc-tool; origin(s): command
line)
------------------------------------------------------------------------------------------------------------------------
Build resources:
- 7.38GB of memory (11.5% of 64.00GB system memory, determined at start)
- 16 thread(s) (100.0% of 16 available processor(s), determined at start)
[2/8] Performing analysis... [*****]
(10.6s 1.80GB)
12,478 reachable types (80.3% of 15,548 total)
21,828 reachable fields (61.0% of 35,767 total)
73,822 reachable methods (50.8% of 145,316 total)
4,544 types, 102 fields, and 3,570 methods registered for reflection
62 types, 66 fields, and 55 methods registered for JNI access
5 native libraries: -framework CoreServices, -framework Foundation,
dl, pthread, z
[3/8] Building universe...
(2.3s 2.03GB)
[4/8] Parsing methods... [*]
(0.9s 2.03GB)
[5/8] Inlining methods... [***]
(0.6s 2.01GB)
[6/8] Compiling methods... [***]
(8.0s 1.66GB)
[7/8] Layouting methods... [**]
(3.8s 2.46GB)
[8/8] Creating image... [***]
(5.2s 1.59GB)
28.05MB (49.64%) for code area: 42,955 compilation units
27.64MB (48.92%) for image heap: 299,518 objects and 65 resources
836.21kB ( 1.45%) for other data
56.51MB in total
------------------------------------------------------------------------------------------------------------------------
Top 10 origins of code area: Top 10 object
types in image heap:
10.94MB java.base 8.97MB
byte[] for code metadata
4.97MB hadoop-client-api-3.4.1.jar 4.45MB
byte[] for java.lang.String
3.47MB hadoop-client-runtime-3.4.1.jar 2.98MB
java.lang.Class
1.35MB protobuf-java-3.25.5.jar 2.92MB
java.lang.String
1.24MB orc-core-2.2.0-SNAPSHOT.jar 1.05MB
com.oracle.svm.core.hub.DynamicHubCompanion
1.14MB svm.jar (Native Image) 791.68kB
byte[] for reflection metadata
745.18kB orc-format-1.1.0.jar 742.68kB
byte[] for general heap data
548.44kB java.security.jgss 595.41kB
java.lang.String[]
429.56kB java.management 457.89kB
c.o.svm.core.hub.DynamicHub$ReflectionMetadata
332.99kB java.rmi 448.46kB heap
alignment
2.67MB for 48 more packages 4.31MB for
2548 more object types
------------------------------------------------------------------------------------------------------------------------
Recommendations:
INIT: Adopt '--strict-image-heap' to prepare for the next GraalVM release.
HEAP: Set max heap for improved and more predictable memory usage.
CPU: Enable more CPU features with '-march=native' for improved
performance.
------------------------------------------------------------------------------------------------------------------------
3.4s (8.8% of total time) in 91 GCs | Peak RSS:
4.78GB | CPU load: 8.16
------------------------------------------------------------------------------------------------------------------------
Produced artifacts:
/Users/dongjoon/APACHE/orc-merge/java/tools/target/orc-tool (executable)
========================================================================================================================
Finished generating 'orc-tool' in 37.9s.
[INFO]
[INFO] --- jar:3.4.2:test-jar (default) orc-tools ---
[INFO] Building jar:
/Users/dongjoon/APACHE/orc-merge/java/tools/target/orc-tools-2.2.0-SNAPSHOT-tests.jar
[INFO]
------------------------------------------------------------------------
[INFO] Reactor Summary for Apache ORC 2.2.0-SNAPSHOT:
[INFO]
[INFO] Apache ORC ......................................... SUCCESS [
1.132 s]
[INFO] ORC Shims .......................................... SUCCESS [
1.584 s]
[INFO] ORC Core ........................................... SUCCESS [
5.747 s]
[INFO] ORC Tools .......................................... SUCCESS [
50.419 s]
[INFO]
------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO]
------------------------------------------------------------------------
[INFO] Total time: 58.971 s
[INFO] Finished at: 2025-06-16T13:23:26-07:00
[INFO]
------------------------------------------------------------------------
```
**Execute**
```
$ tools/target/orc-tool --help
ORC Java Tools
usage: java -jar orc-tools-*.jar [--help] [--define X=Y] <command> <args>
Commands:
check - check the index of the specified column
convert - convert CSV/JSON/ORC files to ORC
count - recursively find *.orc and print the number of rows
data - print the data from the ORC file
json-schema - scan JSON files to determine their schema
key - print information about the keys
merge - merge multiple ORC files into a single ORC file
meta - print the metadata about the ORC file
scan - scan the ORC file
sizes - list size on disk of each column
version - print the version of this ORC tool
To get more help, provide -h to the command
```
### Was this patch authored or co-authored using generative AI tooling?
No.
Closes #2271 from dongjoon-hyun/ORC-1922.
Authored-by: Dongjoon Hyun <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
---
java/tools/pom.xml | 26 ++++++++++++++++++++++++++
1 file changed, 26 insertions(+)
diff --git a/java/tools/pom.xml b/java/tools/pom.xml
index c8fee63de..0e6b73413 100644
--- a/java/tools/pom.xml
+++ b/java/tools/pom.xml
@@ -199,5 +199,31 @@
<directory>${build.dir}/tools</directory>
</build>
</profile>
+ <profile>
+ <id>native</id>
+ <build>
+ <plugins>
+ <plugin>
+ <groupId>org.graalvm.nativeimage</groupId>
+ <artifactId>native-image-maven-plugin</artifactId>
+ <version>20.3.17</version>
+ <executions>
+ <execution>
+ <goals>
+ <goal>native-image</goal>
+ </goals>
+ <configuration>
+ <mainClass>org.apache.orc.tools.Driver</mainClass>
+ <imageName>orc-tool</imageName>
+ <buildArgs>
+ <buildArg>--no-fallback</buildArg>
+ </buildArgs>
+ </configuration>
+ </execution>
+ </executions>
+ </plugin>
+ </plugins>
+ </build>
+ </profile>
</profiles>
</project>