This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/orc.git


The following commit(s) were added to refs/heads/main by this push:
     new 3d382d872 ORC-1922: Support `Native Image` build for `orc-tools`
3d382d872 is described below

commit 3d382d872d21ad542a4574329ed0a10052400250
Author: Dongjoon Hyun <[email protected]>
AuthorDate: Mon Jun 16 17:06:23 2025 -0700

    ORC-1922: Support `Native Image` build for `orc-tools`
    
    ### What changes were proposed in this pull request?
    
    This PR aims to support `Native Image` build for `orc-tools`
    
    ### Why are the changes needed?
    
    To minimize the binary size in terms of both `jar` and Docker images (which 
we can remove Java installation additionally.)
    
    **Uber jar (67MB)**
    ```
    $ ls -alh tools/target/orc-tools-2.2.0-SNAPSHOT-uber.jar
    -rw-r--r--  1 dongjoon  staff    67M Jun 16 13:22 
tools/target/orc-tools-2.2.0-SNAPSHOT-uber.jar
    ```
    
    **Native executable (56MB)**
    ```
    $ ls -alh tools/target/orc-tool
    -rwxr-xr-x 1 dongjoon  staff    56M Jun 16 13:07 tools/target/orc-tool
    ```
    
    ### How was this patch tested?
    
    Manually.
    
    **GraalVM Community Edition**
    ```
    $ java -version
    openjdk version "21.0.2" 2024-01-16
    OpenJDK Runtime Environment GraalVM CE 21.0.2+13.1 (build 
21.0.2+13-jvmci-23.1-b30)
    OpenJDK 64-Bit Server VM GraalVM CE 21.0.2+13.1 (build 
21.0.2+13-jvmci-23.1-b30, mixed mode, sharing)
    ```
    
    **Build `orc-tools` Native Image**
    ```
    $ cd java
    $ mvn clean package --pl tools --am -DskipTests -Pnative
    ...
    
========================================================================================================================
    GraalVM Native Image: Generating 'orc-tool' (executable)...
    
========================================================================================================================
    [1/8] Initializing...                                                       
                             (6.0s  0.29GB)
     Java version: 21.0.2+13, vendor version: GraalVM CE 21.0.2+13.1
     Graal compiler: optimization level: 2, target machine: armv8-a
     C compiler: cc (apple, arm64, 17.0.0)
     Garbage collector: Serial GC (max heap size: 80% of RAM)
     1 user-specific feature(s):
     - com.oracle.svm.thirdparty.gson.GsonFeature
    
------------------------------------------------------------------------------------------------------------------------
     1 experimental option(s) unlocked:
     - '-H:Name' (alternative API option(s): -o orc-tool; origin(s): command 
line)
    
------------------------------------------------------------------------------------------------------------------------
    Build resources:
     - 7.38GB of memory (11.5% of 64.00GB system memory, determined at start)
     - 16 thread(s) (100.0% of 16 available processor(s), determined at start)
    [2/8] Performing analysis...  [*****]                                       
                            (10.6s  1.80GB)
       12,478 reachable types   (80.3% of   15,548 total)
       21,828 reachable fields  (61.0% of   35,767 total)
       73,822 reachable methods (50.8% of  145,316 total)
        4,544 types,   102 fields, and 3,570 methods registered for reflection
           62 types,    66 fields, and    55 methods registered for JNI access
            5 native libraries: -framework CoreServices, -framework Foundation, 
dl, pthread, z
    [3/8] Building universe...                                                  
                             (2.3s  2.03GB)
    [4/8] Parsing methods...      [*]                                           
                             (0.9s  2.03GB)
    [5/8] Inlining methods...     [***]                                         
                             (0.6s  2.01GB)
    [6/8] Compiling methods...    [***]                                         
                             (8.0s  1.66GB)
    [7/8] Layouting methods...    [**]                                          
                             (3.8s  2.46GB)
    [8/8] Creating image...       [***]                                         
                             (5.2s  1.59GB)
      28.05MB (49.64%) for code area:    42,955 compilation units
      27.64MB (48.92%) for image heap:  299,518 objects and 65 resources
     836.21kB ( 1.45%) for other data
      56.51MB in total
    
------------------------------------------------------------------------------------------------------------------------
    Top 10 origins of code area:                                Top 10 object 
types in image heap:
      10.94MB java.base                                            8.97MB 
byte[] for code metadata
       4.97MB hadoop-client-api-3.4.1.jar                          4.45MB 
byte[] for java.lang.String
       3.47MB hadoop-client-runtime-3.4.1.jar                      2.98MB 
java.lang.Class
       1.35MB protobuf-java-3.25.5.jar                             2.92MB 
java.lang.String
       1.24MB orc-core-2.2.0-SNAPSHOT.jar                          1.05MB 
com.oracle.svm.core.hub.DynamicHubCompanion
       1.14MB svm.jar (Native Image)                             791.68kB 
byte[] for reflection metadata
     745.18kB orc-format-1.1.0.jar                               742.68kB 
byte[] for general heap data
     548.44kB java.security.jgss                                 595.41kB 
java.lang.String[]
     429.56kB java.management                                    457.89kB 
c.o.svm.core.hub.DynamicHub$ReflectionMetadata
     332.99kB java.rmi                                           448.46kB heap 
alignment
       2.67MB for 48 more packages                                 4.31MB for 
2548 more object types
    
------------------------------------------------------------------------------------------------------------------------
    Recommendations:
     INIT: Adopt '--strict-image-heap' to prepare for the next GraalVM release.
     HEAP: Set max heap for improved and more predictable memory usage.
     CPU:  Enable more CPU features with '-march=native' for improved 
performance.
    
------------------------------------------------------------------------------------------------------------------------
                            3.4s (8.8% of total time) in 91 GCs | Peak RSS: 
4.78GB | CPU load: 8.16
    
------------------------------------------------------------------------------------------------------------------------
    Produced artifacts:
     /Users/dongjoon/APACHE/orc-merge/java/tools/target/orc-tool (executable)
    
========================================================================================================================
    Finished generating 'orc-tool' in 37.9s.
    [INFO]
    [INFO] --- jar:3.4.2:test-jar (default)  orc-tools ---
    [INFO] Building jar: 
/Users/dongjoon/APACHE/orc-merge/java/tools/target/orc-tools-2.2.0-SNAPSHOT-tests.jar
    [INFO] 
------------------------------------------------------------------------
    [INFO] Reactor Summary for Apache ORC 2.2.0-SNAPSHOT:
    [INFO]
    [INFO] Apache ORC ......................................... SUCCESS [  
1.132 s]
    [INFO] ORC Shims .......................................... SUCCESS [  
1.584 s]
    [INFO] ORC Core ........................................... SUCCESS [  
5.747 s]
    [INFO] ORC Tools .......................................... SUCCESS [ 
50.419 s]
    [INFO] 
------------------------------------------------------------------------
    [INFO] BUILD SUCCESS
    [INFO] 
------------------------------------------------------------------------
    [INFO] Total time:  58.971 s
    [INFO] Finished at: 2025-06-16T13:23:26-07:00
    [INFO] 
------------------------------------------------------------------------
    ```
    
    **Execute**
    ```
    $ tools/target/orc-tool --help
    ORC Java Tools
    
    usage: java -jar orc-tools-*.jar [--help] [--define X=Y] <command> <args>
    
    Commands:
       check - check the index of the specified column
       convert - convert CSV/JSON/ORC files to ORC
       count - recursively find *.orc and print the number of rows
       data - print the data from the ORC file
       json-schema - scan JSON files to determine their schema
       key - print information about the keys
       merge - merge multiple ORC files into a single ORC file
       meta - print the metadata about the ORC file
       scan - scan the ORC file
       sizes - list size on disk of each column
       version - print the version of this ORC tool
    
    To get more help, provide -h to the command
    ```
    
    ### Was this patch authored or co-authored using generative AI tooling?
    
    No.
    
    Closes #2271 from dongjoon-hyun/ORC-1922.
    
    Authored-by: Dongjoon Hyun <[email protected]>
    Signed-off-by: Dongjoon Hyun <[email protected]>
---
 java/tools/pom.xml | 26 ++++++++++++++++++++++++++
 1 file changed, 26 insertions(+)

diff --git a/java/tools/pom.xml b/java/tools/pom.xml
index c8fee63de..0e6b73413 100644
--- a/java/tools/pom.xml
+++ b/java/tools/pom.xml
@@ -199,5 +199,31 @@
         <directory>${build.dir}/tools</directory>
       </build>
     </profile>
+    <profile>
+      <id>native</id>
+      <build>
+        <plugins>
+          <plugin>
+            <groupId>org.graalvm.nativeimage</groupId>
+            <artifactId>native-image-maven-plugin</artifactId>
+            <version>20.3.17</version>
+            <executions>
+              <execution>
+                <goals>
+                  <goal>native-image</goal>
+                </goals>
+                <configuration>
+                  <mainClass>org.apache.orc.tools.Driver</mainClass>
+                  <imageName>orc-tool</imageName>
+                  <buildArgs>
+                    <buildArg>--no-fallback</buildArg>
+                  </buildArgs>
+                </configuration>
+              </execution>
+            </executions>
+          </plugin>
+        </plugins>
+      </build>
+    </profile>
   </profiles>
 </project>

Reply via email to