Hi Uwe,

As I mentioned in the original post, the main issue I've come across with
being able to support Java 11 modules is that Lucene doesn't cleanly
separate the use of package names across jar libraries which results in
errors like:

"The package org.apache.lucene.analysis.standard is accessible from more
than one module: lucene.analyzers.common, lucene.core"

I ran some analysis to see how extensive the problem is using the code
below (pass in the directory of unzipped Lucene 8.5.0 as argument).
Actually, the problem is really not that big and could be fixed. If you
ignore duplicates in lucene-test-framework-8.5.0.jar,
lucene-backward-codecs-8.5.0.jar and lucene-misc-8.5.0, you are left with
the following issues:

package org.apache.lucene.analysis.standard
 - lucene-core-8.5.0.jar
 - lucene-analyzers-common-8.5.0.jar

Would it be viable to rename package org.apache.lucene.analysis.standard in
lucene-analyzers-common to org.apache.lucene.analysis.classic or move the
classes into the core library?

package org.apache.lucene.search
package org.apache.lucene.document
 - lucene-core-8.5.0.jar
 - lucene-sandbox-8.5.0.jar

Would it be possible to move the search and document package in
lucene-sandbox into a sandbox sub-package? e.g.
org.apache.lucene.sandbox.search

package org.apache.lucene.collation
package org.apache.lucene.collation.tokenattributes
 - lucene-analyzers-common-8.5.0.jar
 - lucene-analyzers-icu-8.5.0.jar

Would it be possible to move collation into a collation.icu sub-package?
e.g. org.apache.lucene.collation.icu

As I'm not deeply familiar with the Lucene code base, I'm don't know what
the flow-on effects of these changes would be. However, I'd be happy to
raise the JIRA issue(s) and prepare the changes/pull request.

I believe with these changes many of the important Lucene libraries could
be cleanly brought into an application using the module system. Another
nice to have would be to add an Automatic-Module-Name entry to the
MANIFEST.MF of each jar to stop automated module names.

As you mentioned the fall back is to create a lucene-all-8.5.0.jar and
combine all classes into a single large file. Given this is not currently
published on Maven, if there's no interest in making the above changes,
would it be possible to change the build system to publish a lucene-all
package?

Regards,
David.

-------- Full output of duplicated packages --------

package org.apache.lucene.analysis.standard
 - lucene-core-8.5.0.jar
 - lucene-analyzers-common-8.5.0.jar
 - lucene-test-framework-8.5.0.jar

package org.apache.lucene.codecs.blockterms
 - lucene-codecs-8.5.0.jar
 - lucene-test-framework-8.5.0.jar

package org.apache.lucene.codecs.lucene80
 - lucene-core-8.5.0.jar
 - lucene-backward-codecs-8.5.0.jar

package org.apache.lucene.search.similarities
 - lucene-core-8.5.0.jar
 - lucene-test-framework-8.5.0.jar

package org.apache.lucene.codecs.lucene70
 - lucene-core-8.5.0.jar
 - lucene-backward-codecs-8.5.0.jar

package org.apache.lucene.store
 - lucene-misc-8.5.0.jar
 - lucene-core-8.5.0.jar
 - lucene-test-framework-8.5.0.jar

package org.apache.lucene.codecs.bloom
 - lucene-codecs-8.5.0.jar
 - lucene-test-framework-8.5.0.jar

package org.apache.lucene.codecs.lucene50
 - lucene-core-8.5.0.jar
 - lucene-backward-codecs-8.5.0.jar

package org.apache.lucene.search.spans
 - lucene-core-8.5.0.jar
 - lucene-test-framework-8.5.0.jar

package org.apache.lucene.index
 - lucene-misc-8.5.0.jar
 - lucene-core-8.5.0.jar
 - lucene-test-framework-8.5.0.jar

package org.apache.lucene.util.fst
 - lucene-misc-8.5.0.jar
 - lucene-core-8.5.0.jar
 - lucene-test-framework-8.5.0.jar

package org.apache.lucene.analysis
 - lucene-core-8.5.0.jar
 - lucene-test-framework-8.5.0.jar

package org.apache.lucene.collation
 - lucene-analyzers-common-8.5.0.jar
 - lucene-analyzers-icu-8.5.0.jar

package org.apache.lucene.codecs.uniformsplit.sharedterms
 - lucene-codecs-8.5.0.jar
 - lucene-test-framework-8.5.0.jar

package org.apache.lucene.geo
 - lucene-core-8.5.0.jar
 - lucene-test-framework-8.5.0.jar

package org.apache.lucene.search
 - lucene-misc-8.5.0.jar
 - lucene-core-8.5.0.jar
 - lucene-test-framework-8.5.0.jar
 - lucene-sandbox-8.5.0.jar

package org.apache.lucene.collation.tokenattributes
 - lucene-analyzers-common-8.5.0.jar
 - lucene-analyzers-icu-8.5.0.jar

package org.apache.lucene.document
 - lucene-misc-8.5.0.jar
 - lucene-core-8.5.0.jar
 - lucene-sandbox-8.5.0.jar

package org.apache.lucene.codecs.compressing
 - lucene-core-8.5.0.jar
 - lucene-test-framework-8.5.0.jar

package org.apache.lucene.codecs.uniformsplit
 - lucene-codecs-8.5.0.jar
 - lucene-test-framework-8.5.0.jar

package org.apache.lucene.util
 - lucene-misc-8.5.0.jar
 - lucene-core-8.5.0.jar
 - lucene-test-framework-8.5.0.jar

package org.apache.lucene.codecs
 - lucene-core-8.5.0.jar
 - lucene-test-framework-8.5.0.jar
 - lucene-backward-codecs-8.5.0.jar

package org.apache.lucene.util.automaton
 - lucene-core-8.5.0.jar
 - lucene-test-framework-8.5.0.jar


-------

import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.Map.Entry;
import java.util.zip.ZipEntry;
import java.util.zip.ZipInputStream;

public class ReadJars {

    public static void main(final String[] arguments) throws IOException {
        System.out.println("Find duplicate packages: <path to jar files>");

        Map<String, List<String>> duplicates = new HashMap<>();
        List<File> files = new ArrayList<File>();

        // Find all files in sub-directory.
        addFiles(files, arguments[0]);

        // Find all package duplicates.
        findDuplicatePackages(duplicates, files);

        // Print package with duplicate packages
        for (Entry<String, List<String>> entry : duplicates.entrySet()) {
            System.out.println("\npackage " + entry.getKey());
            for (String dup : entry.getValue()) {
                System.out.println(" - " + dup);
            }
        }

    }

    private static void findDuplicatePackages(Map<String, List<String>>
duplicates, List<File> files) throws IOException {
        Map<String, String> packageToJarMap = new HashMap<>();
        for (File f : files) {
            if (f.getAbsolutePath().endsWith(".jar")) {

                String file =
f.getAbsolutePath().substring(f.getAbsolutePath().lastIndexOf('/') + 1);
                ZipInputStream zip = new ZipInputStream(new
FileInputStream(f));

                for (ZipEntry entry = zip.getNextEntry(); entry != null;
entry = zip.getNextEntry()) {
                    if (!entry.isDirectory() &&
entry.getName().endsWith(".class")) {
                        String className = entry.getName();
                        String packageName = "";
                        if (className.lastIndexOf('/') >= 0) {
                            packageName = className.substring(0,
className.lastIndexOf('/')).replace('/', '.');
                        }
                        className = className.replace('/', '.');

                        // Only interested in Lucene packages.
                        if (className.startsWith("org.apache.lucene")) {

                            String jar = packageToJarMap.get(packageName);
                            if (jar != null && !jar.equals(file)) {
                                List<String> duplicate =
duplicates.get(packageName);
                                if (duplicate == null) {
                                    duplicate = new ArrayList<>();
                                    duplicate.add(jar);
                                    duplicate.add(file);
                                    duplicates.put(packageName, duplicate);
                                } else {
                                    boolean found = false;
                                    for (String dup : duplicate) {
                                        if (file.equals(dup)) {
                                            found = true;
                                        }
                                    }
                                    if (!found) {
                                        duplicate.add(file);
                                    }
                                }
                            } else {
                                packageToJarMap.put(packageName, file);
                            }
                        }
                    }
                }
                zip.close();
            }
        }
    }

    public static void addFiles(List<File> files, String directoryName) {
        File directory = new File(directoryName);
        for (File file : directory.listFiles()) {
            if (file.isFile()) {
                files.add(file);
            } else if (file.isDirectory()) {
                addFiles(files, file.getAbsolutePath());
            }
        }
    }
}

--------


On Tue, Mar 24, 2020 at 9:26 PM Uwe Schindler <u...@thetaphi.de> wrote:

> Hi,
>
>
>
> this is a known problem since many year and there are no plans to change
> this yet. The main reason for Java 11 support is not to introduce the
> module system, but instead use the new features of Java 11 and to get rid
> MR-JAR complications to make use of new intrinsics.
>
>
>
> Servers like Solr or Elasticsearch are shipped as an application, so the
> module system does not bring any benefit.
>
>
>
> Th general recommendation is to combine all required Lucene libraries into
> a separate JAR file during the maven / gradle build (e.g. using the Maven
> Shade plugin). Keep in mind that Lucene is also not suitable for use in
> other module systems like
>
>
>
> There is currently some preparatory things to move forward with modules,
> so although you might be able to actually compile Lucene with module system
> (by limiting to a subset of JAR files), it currently won’t work
> cross-module due to the way how it handles ServiceLoader interfaces
> (codecs, postings formats, analyzers, see
> https://issues.apache.org/jira/browse/LUCENE-9281). The only way to make
> it work at runtime is to put all of Lucene into one module.
>
>
>
> Uwe
>
>
>
> -----
>
> Uwe Schindler
>
> Achterdiek 19, D-28357 Bremen
>
> https://www.thetaphi.de
>
> eMail: u...@thetaphi.de
>
>
>
> *From:* David Ryan <da...@livemedia.com.au>
> *Sent:* Tuesday, March 24, 2020 8:05 AM
> *To:* dev@lucene.apache.org
> *Subject:* Lucene 9.0 Java module system support
>
>
>
>
>
> Hi all,
>
>
>
> I've been investigating the use of Lucene as part of an application that
> uses the Java Module System. Initially, I used Gradle to bring in
> lucene-core, lucene-spatial and lucene-queries using version 8.5.0. This
> works with the automated module naming. However, bringing in
> lucene-queryparser (which depends on lucene-sandbox) or
> lucene-analyzers-common causes errors such as:
>
>
>
> "The package org.apache.lucene.analysis.standard is accessible from more
> than one module: lucene.analyzers.common, lucene.core"
>
>
>
> The Java module system does not handle different jar files using the same
> package name which occurs throughout multiple maven artifacts.
>
>
>
> Looking at the LUCENE issues, I found the task of moving to a minimum
> version of Java 11, however, this does not mention the ability to be
> compatible with the module system.  I also checked the git repository and
> couldn't find the required changes to support the module system.
>
>
>
> https://issues.apache.org/jira/browse/LUCENE-8738
>
>
>
> I looked through the dev list recent history but could not find anything
> related. Are there any plans to support modules? Given, I saw there are a
> number of other breaking changes happening with the move to Lucene 9.0,
> would it be good to make those changes?
>
>
>
> Thanks,
>
> David.
>
>
>

Reply via email to