I believe Spark 4.0 was mentioned before. It’ll require Java 17 and will be
released in a few months (June?).

Best regards,
Adam Lippai

On Tue, Dec 5, 2023 at 12:05 David Li <lidav...@apache.org> wrote:

> Thanks James for delving into this mess.
>
> It looks like this change is unavoidable if we want to modularize? I think
> this is OK. Will the CLI argument change as we continue modularizing, or is
> this the only change that will be needed?
>
> On Mon, Dec 4, 2023, at 20:07, James Duong wrote:
> > Hello,
> >
> > I did some work to separate the below PR into smaller PRs.
> >
> >
> >   *   Updating the versions of dependencies and maven plugins is done
> > and merged into master.
> >   *   I separated out the work modularizing arrow-vector,
> > arrow-memory-core/unsafe, and arrow-memory-netty.
> >
> > Modularizing arrow-memory-core requires a smaller change to user
> > command-line arguments. Instead of:
> > --add-opens=java.base/java.nio=ALL-UNNAMED
> >
> > The user needs to add:
> > --add-opens=java.base/java.nio=org.apache.arrow.memory.core,ALL-UNNAMED
> >
> > I initially tried to modularize arrow-vector separately from
> > arrow-memory-core but found that any meaningful operation in
> > arrow-vector would trigger an illegal access in memory-core if it
> > wasn’t modularized.
> >
> > I was able to run the tests for arrow-compression and arrow-tools
> > successfully after modularizing memory-core, memory-unsafe-, and
> > arrow-vector. Note that I had more success by making memory-core and
> > memory-unsafe automatic modules.
> >
> > I think we should make a decision here on if we want to bite the bullet
> > and introduce a breaking user-facing change around command-line
> > options. The other option is to wait for JDK 21 to modularize. That’s
> > farther down the line and requires refactoring much of the memory
> > module code and implementing a module using the foreign memory
> > interface.
> >
> > From: James Duong <james.du...@improving.com.INVALID>
> > Date: Tuesday, November 28, 2023 at 6:48 PM
> > To: dev@arrow.apache.org <dev@arrow.apache.org>
> > Subject: Re: [DISC][Java]: Migrate Arrow Java to JPMS Java Platform
> > Module System
> > Hi,
> >
> > I’ve made some major progress on this work in this PR:
> > https://github.com/apache/arrow/pull/38876
> >
> >
> >   *   The maven plugin for compiling module-info.java files using JDK 8
> > is working correctly.
> >   *   arrow-format, arrow-memory-core, arrow-memory-netty,
> > arrow-memory-unsafe, and arrow-vector have been modularized
> > successfully.
> >      *   Tests pass locally for all of these modules.
> >      *   They fail in CI. This is likely from me not updating a profile
> > somewhere.
> >
> > Similar to David’s PR from below, arrow-memory and modules needed to be
> > refactored fairly significantly and split into two modules: a
> > public-facing JPMS module and a separate module which adds to Netty’s
> > packages (memory-netty-buffer-patch). What’s more problematic is that
> > because we are using named modules now, users need to add more
> > arguments to their Java command line to use arrow. If one were to use
> > arrow-memory-netty they would need to add the following:
> >
> > --add-opens java.base/jdk.internal.misc=io.netty.common
> >
> --patch-module=io.netty.buffer=${project.basedir}/../memory-netty-buffer-patch/target/arrow-memory-netty-buffer-patch-${project.version}.jar
>
> >
> --add-opens=java.base/java.nio=org.apache.arrow.memory.core,io.netty.common,ALL-UNNAMED
> >
> > Depending on where the memory-netty-buffer-patch JAR is located, and
> > what version, the command the user needs to supply changes, so this
> > seems like it’d be really inconvenient.
> >
> > Do we want to proceed with modularizing existing memory modules? Both
> > netty and unsafe? Or wait until the new memory module from Java 21 is
> > available?
> >
> > The module-info.java files are written fairly naively. I haven’t
> > inspected thoroughly to determine what packages users will need.
> >
> > We can continue modularizing more components in a separate PR. Ideally
> > all the user breakage (class movement, new command-line argument
> > requirements) happens within one major Arrow version.
> >
> > From: James Duong <james.du...@improving.com.INVALID>
> > Date: Tuesday, November 21, 2023 at 1:16 PM
> > To: dev@arrow.apache.org <dev@arrow.apache.org>
> > Subject: Re: [DISC][Java]: Migrate Arrow Java to JPMS Java Platform
> > Module System
> > I’m following up on this topic.
> >
> > David has a PR from last year that’s done much of the heavy lifting for
> > refactoring the codebase to be package-friendly.
> > https://github.com/apache/arrow/pull/13072
> >
> > What’s changed since and what’s left:
> >
> >   *   New components have been added (Flight SQL for example) that will
> > need to be updated for modules.
> >   *   There wasn’t a clear solution on how to do this without breaking
> > JDK 8 support. Compiling module-info.java files require using JDK9, but
> > using JDK9 breaks using JDK8 methods of accessing sun.misc.Unsafe.
> >      *   There is a Gradle plugin that can compile module-info.java
> > files purely syntactically that we can adapt to maven. It has
> > limitations (the one I see is that it can’t iterate through
> > classloaders to handle annotations), but using this might be a good
> > stopgap until we JDK 8 support is deprecated.
> >   *   Some plugins need to be updated:
> >      *   maven-dependency-plugin 3.0.1 can’t parse module-info.class
> > files.
> >      *   checkstyle 3.1.0 can’t parse module-info.java files. Our
> > existing checkstyle rules file can’t be loaded with newer versions. We
> > can exclude module-info.java for now and have a separate Issue for
> > updating checkstyle itself and the rules file.
> >   *   grpc-java could not be modularized when the PR above was written.
> >      *   Grpc 1.57 now can be modularized
> > (grpc/grpc-java#3522<https://github.com/grpc/grpc-java/issues/3522>)
> >
> > From: David Dali Susanibar Arce <davi.sar...@gmail.com>
> > Date: Wednesday, May 25, 2022 at 5:02 AM
> > To: dev@arrow.apache.org <dev@arrow.apache.org>
> > Subject: [DISC][Java]: Migrate Arrow Java to JPMS Java Platform Module
> System
> > Hi All,
> >
> > This email's purpose is a request for comments to migrate Arrow Java to
> JPMS
> > Java Platform Module System <
> https://openjdk.java.net/projects/jigsaw/spec/>
> > JSE 9+ (1).
> >
> > Current status:
> >
> > - Arrow Java use JSE1.8 specification
> >
> > - Arrow Java works with JSE1.8/9/11/17
> >
> > - This is possible because Java offers “legacy mode”
> >
> > Proposal:
> >
> > Migrate to JPMS Java Platform Module System. This Draft PR
> > <https://github.com/apache/arrow/pull/13072>(2<
> https://github.com/apache/arrow/pull/13072%3e(2<
> https://github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2
> <
> https://github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2>>>)
>
> > contains an initial port of
> > the modules: Format / Memory Core / Memory Netty / Memory Unsafe /
> > Vector
> > for evaluation.
> >
> > Main Reason to migrate:
> >
> > - JPMS offer Strong encapsulation, Well-defined interfaces
> > <https://github.com/nipafx/demo-jigsaw-reflection>, Explicit
> dependencies.
> > <https://nipafx.dev/java-modules-reflection-vs-encapsulation/> (3)(4)
> >
> > - JPMS offer reliable configuration and security to hide platform
> internals.
> >
> > - JPMS offers a partial solution to solve problems about read (80%)
> /write
> > (20%) code.
> >
> > - JPMS offer optimization for readability about read/write ratio (90/10)
> > thru module-info.java.
> >
> > - Consistency logs, JPMS implement consistency logs to really use that to
> > solve the current problem.
> >
> > - Be able to customize JRE needed with only modules needed (not
> > java.desktop for example and others) thru JLink.
> >
> > - Modules have also been implemented by other languages such as
> Javascript
> > (ES2015), C++(C++20), Net (Nuget/NetCore)..
> >
> > - Consider taking a look at this discussion about pros/cons
> > <
> https://www.reddit.com/r/java/comments/okt3j3/do_you_use_jigsaw_modules_in_your_java_projects/
> >
> > (5).
> >
> > - Eventual migration to JPMS is a practical necessity as more projects
> > migrate.
> >
> > Effort:
> >
> > - First of all we need to decide to move from JSE1.8 to JSE9+ or be able
> to
> > offer support for both jar components JSE1.8 and JSE9+ included.
> >
> > - Go bottom up for JPMS.
> >
> > - Packages need to be unique (i.e. org.apache.arrow.memory /
> > io.netty.buffer). Review Draft PR with initial proposal.
> >
> > - Dependencies also need to be modularized. If some of our current
> > dependencies are not able to be used as a module this will be a blocker
> for
> > our modules (we could patch that but this is an extra effort).
> >
> > Killers:
> >
> > - FIXME! I need your support to identify killer reasons to be able to
> push
> > this implementation.
> >
> > Please let us know if Arrow Java to JPMS Java Platform Module System is
> > needed and should be implemented.
> >
> > Please use this file for any comments
> >
> https://docs.google.com/document/d/1qcJ8LPm33UICuGjRnsGBcm8dLI08MyiL8BO5JVzTutA/edit?usp=sharing
> >
> > Resources used:
> >
> > (1): https://openjdk.java.net/projects/jigsaw/spec/
> >
> > (2): https://github.com/apache/arrow/pull/13072
> >
> > (3): https://nipafx.dev/java-modules-reflection-vs-encapsulation/
> >
> > (4): https://github.com/nipafx/demo-jigsaw-reflection
> >
> > (5):
> >
> https://www.reddit.com/r/java/comments/okt3j3/do_you_use_jigsaw_modules_in_your_java_projects/
> >
> > Best regards,
> >
> > --
> > David
>

Reply via email to