Thanks James for delving into this mess.

It looks like this change is unavoidable if we want to modularize? I think this 
is OK. Will the CLI argument change as we continue modularizing, or is this the 
only change that will be needed?

On Mon, Dec 4, 2023, at 20:07, James Duong wrote:
> Hello,
>
> I did some work to separate the below PR into smaller PRs.
>
>
>   *   Updating the versions of dependencies and maven plugins is done 
> and merged into master.
>   *   I separated out the work modularizing arrow-vector, 
> arrow-memory-core/unsafe, and arrow-memory-netty.
>
> Modularizing arrow-memory-core requires a smaller change to user 
> command-line arguments. Instead of:
> --add-opens=java.base/java.nio=ALL-UNNAMED
>
> The user needs to add:
> --add-opens=java.base/java.nio=org.apache.arrow.memory.core,ALL-UNNAMED
>
> I initially tried to modularize arrow-vector separately from 
> arrow-memory-core but found that any meaningful operation in 
> arrow-vector would trigger an illegal access in memory-core if it 
> wasn’t modularized.
>
> I was able to run the tests for arrow-compression and arrow-tools 
> successfully after modularizing memory-core, memory-unsafe-, and 
> arrow-vector. Note that I had more success by making memory-core and 
> memory-unsafe automatic modules.
>
> I think we should make a decision here on if we want to bite the bullet 
> and introduce a breaking user-facing change around command-line 
> options. The other option is to wait for JDK 21 to modularize. That’s 
> farther down the line and requires refactoring much of the memory 
> module code and implementing a module using the foreign memory 
> interface.
>
> From: James Duong <james.du...@improving.com.INVALID>
> Date: Tuesday, November 28, 2023 at 6:48 PM
> To: dev@arrow.apache.org <dev@arrow.apache.org>
> Subject: Re: [DISC][Java]: Migrate Arrow Java to JPMS Java Platform 
> Module System
> Hi,
>
> I’ve made some major progress on this work in this PR:
> https://github.com/apache/arrow/pull/38876
>
>
>   *   The maven plugin for compiling module-info.java files using JDK 8 
> is working correctly.
>   *   arrow-format, arrow-memory-core, arrow-memory-netty, 
> arrow-memory-unsafe, and arrow-vector have been modularized 
> successfully.
>      *   Tests pass locally for all of these modules.
>      *   They fail in CI. This is likely from me not updating a profile 
> somewhere.
>
> Similar to David’s PR from below, arrow-memory and modules needed to be 
> refactored fairly significantly and split into two modules: a 
> public-facing JPMS module and a separate module which adds to Netty’s 
> packages (memory-netty-buffer-patch). What’s more problematic is that 
> because we are using named modules now, users need to add more 
> arguments to their Java command line to use arrow. If one were to use 
> arrow-memory-netty they would need to add the following:
>
> --add-opens java.base/jdk.internal.misc=io.netty.common 
> --patch-module=io.netty.buffer=${project.basedir}/../memory-netty-buffer-patch/target/arrow-memory-netty-buffer-patch-${project.version}.jar
>  
> --add-opens=java.base/java.nio=org.apache.arrow.memory.core,io.netty.common,ALL-UNNAMED
>
> Depending on where the memory-netty-buffer-patch JAR is located, and 
> what version, the command the user needs to supply changes, so this 
> seems like it’d be really inconvenient.
>
> Do we want to proceed with modularizing existing memory modules? Both 
> netty and unsafe? Or wait until the new memory module from Java 21 is 
> available?
>
> The module-info.java files are written fairly naively. I haven’t 
> inspected thoroughly to determine what packages users will need.
>
> We can continue modularizing more components in a separate PR. Ideally 
> all the user breakage (class movement, new command-line argument 
> requirements) happens within one major Arrow version.
>
> From: James Duong <james.du...@improving.com.INVALID>
> Date: Tuesday, November 21, 2023 at 1:16 PM
> To: dev@arrow.apache.org <dev@arrow.apache.org>
> Subject: Re: [DISC][Java]: Migrate Arrow Java to JPMS Java Platform 
> Module System
> I’m following up on this topic.
>
> David has a PR from last year that’s done much of the heavy lifting for 
> refactoring the codebase to be package-friendly.
> https://github.com/apache/arrow/pull/13072
>
> What’s changed since and what’s left:
>
>   *   New components have been added (Flight SQL for example) that will 
> need to be updated for modules.
>   *   There wasn’t a clear solution on how to do this without breaking 
> JDK 8 support. Compiling module-info.java files require using JDK9, but 
> using JDK9 breaks using JDK8 methods of accessing sun.misc.Unsafe.
>      *   There is a Gradle plugin that can compile module-info.java 
> files purely syntactically that we can adapt to maven. It has 
> limitations (the one I see is that it can’t iterate through 
> classloaders to handle annotations), but using this might be a good 
> stopgap until we JDK 8 support is deprecated.
>   *   Some plugins need to be updated:
>      *   maven-dependency-plugin 3.0.1 can’t parse module-info.class 
> files.
>      *   checkstyle 3.1.0 can’t parse module-info.java files. Our 
> existing checkstyle rules file can’t be loaded with newer versions. We 
> can exclude module-info.java for now and have a separate Issue for 
> updating checkstyle itself and the rules file.
>   *   grpc-java could not be modularized when the PR above was written.
>      *   Grpc 1.57 now can be modularized 
> (grpc/grpc-java#3522<https://github.com/grpc/grpc-java/issues/3522>)
>
> From: David Dali Susanibar Arce <davi.sar...@gmail.com>
> Date: Wednesday, May 25, 2022 at 5:02 AM
> To: dev@arrow.apache.org <dev@arrow.apache.org>
> Subject: [DISC][Java]: Migrate Arrow Java to JPMS Java Platform Module System
> Hi All,
>
> This email's purpose is a request for comments to migrate Arrow Java to JPMS
> Java Platform Module System <https://openjdk.java.net/projects/jigsaw/spec/>
> JSE 9+ (1).
>
> Current status:
>
> - Arrow Java use JSE1.8 specification
>
> - Arrow Java works with JSE1.8/9/11/17
>
> - This is possible because Java offers “legacy mode”
>
> Proposal:
>
> Migrate to JPMS Java Platform Module System. This Draft PR
> <https://github.com/apache/arrow/pull/13072>(2<https://github.com/apache/arrow/pull/13072%3e(2<https://github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2<https://github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2>>>)
>  
> contains an initial port of
> the modules: Format / Memory Core / Memory Netty / Memory Unsafe / 
> Vector
> for evaluation.
>
> Main Reason to migrate:
>
> - JPMS offer Strong encapsulation, Well-defined interfaces
> <https://github.com/nipafx/demo-jigsaw-reflection>, Explicit dependencies.
> <https://nipafx.dev/java-modules-reflection-vs-encapsulation/> (3)(4)
>
> - JPMS offer reliable configuration and security to hide platform internals.
>
> - JPMS offers a partial solution to solve problems about read (80%) /write
> (20%) code.
>
> - JPMS offer optimization for readability about read/write ratio (90/10)
> thru module-info.java.
>
> - Consistency logs, JPMS implement consistency logs to really use that to
> solve the current problem.
>
> - Be able to customize JRE needed with only modules needed (not
> java.desktop for example and others) thru JLink.
>
> - Modules have also been implemented by other languages such as Javascript
> (ES2015), C++(C++20), Net (Nuget/NetCore)..
>
> - Consider taking a look at this discussion about pros/cons
> <https://www.reddit.com/r/java/comments/okt3j3/do_you_use_jigsaw_modules_in_your_java_projects/>
> (5).
>
> - Eventual migration to JPMS is a practical necessity as more projects
> migrate.
>
> Effort:
>
> - First of all we need to decide to move from JSE1.8 to JSE9+ or be able to
> offer support for both jar components JSE1.8 and JSE9+ included.
>
> - Go bottom up for JPMS.
>
> - Packages need to be unique (i.e. org.apache.arrow.memory /
> io.netty.buffer). Review Draft PR with initial proposal.
>
> - Dependencies also need to be modularized. If some of our current
> dependencies are not able to be used as a module this will be a blocker for
> our modules (we could patch that but this is an extra effort).
>
> Killers:
>
> - FIXME! I need your support to identify killer reasons to be able to push
> this implementation.
>
> Please let us know if Arrow Java to JPMS Java Platform Module System is
> needed and should be implemented.
>
> Please use this file for any comments
> https://docs.google.com/document/d/1qcJ8LPm33UICuGjRnsGBcm8dLI08MyiL8BO5JVzTutA/edit?usp=sharing
>
> Resources used:
>
> (1): https://openjdk.java.net/projects/jigsaw/spec/
>
> (2): https://github.com/apache/arrow/pull/13072
>
> (3): https://nipafx.dev/java-modules-reflection-vs-encapsulation/
>
> (4): https://github.com/nipafx/demo-jigsaw-reflection
>
> (5):
> https://www.reddit.com/r/java/comments/okt3j3/do_you_use_jigsaw_modules_in_your_java_projects/
>
> Best regards,
>
> --
> David

Reply via email to