[DISCUSS] Folks,
The problem that Furkan discovered is a subtle bug that I had not seen before and affects our Java repositories where we fetch a resource from either the *src/main/resources* or *src/test/resources* and use the method *getClass().getClassLoader().getResource(<name>).getFile(), *which ironically does not return a file, but returns a URL. A URL is what would be required to obtain a remote resource, but that is not how we have been using it. As a URL it can contain special character sequences, such as "%20", that are substituted for *spaces *if the user's absolute system path to the resource contains them. The use of spaces in path names is system dependent although I think both unix and windows based file systems now allow it. However, having characters such as "%20" in the place of spaces in the URL string will clearly fail if the file system attempts to find the file using the URL string. And this is exactly what Furkan observed. Another irony here is that according to Oracle, spaces are not allowed anywhere in the path to obtain a resource, to wit: https://docs.oracle.com/javase/8/docs/technotes/guides/lang/resources.html > A resource is identified by a string consisting of a sequence of > substrings, delimited by slashes (/), followed by a resource name. *Each > substring must be a valid Java identifier*. The resource name is of the > form shortName or shortName.extension. Both shortName and extension must be > Java identifiers. > The name of a resource is independent of the Java implementation; in > particular, the path separator is always a slash (/). However, the Java > implementation controls the details of how the contents of the resource are > mapped into a file, database, or other object containing the actual > resource. > The interpretation of a resource name is relative to a class loader > instance. Methods implemented by the ClassLoader class do this > interpretation. All the methods of the ClassLoader that return path-like strings return absolute paths. Thus, if the user happens to have directory names with spaces at any level of the path hierarchy, it would fail this Oracle requirement. This issue and the confusing return of getFile() has been raised as a bug against Java some years ago, but has never been fixed. Instead of String path = getClass().getClassLoader().getResource(shortName).getFile(); > the recommended way to obtain the correct file-system usable path is String path = > getClass().getClassLoader().getResource(shortName).toURI().getPath(); This nicely fixes the problem in a system-independent and portable way. The getPath() method performs a decode operation on the URI and replaces the "%20" sequences with spaces (as well as fixes any other special characters in the string. I found over 30 places in Memory and Java core where we need to correct this. I have not yet searched the other Java repos. In studying our use cases it appears that we do two things with our resources, get the file, or get the bytes of the file as a byte array. There are at least two ways to go for a fix: 1. Do a search / replace on "getResource(...).getFile()" and change it to "getResource(...).toURI().getPath()". Even with this we have to do try/catch on a number of exceptions and remember how to efficiently convert the path to a File or to bytes, etc. This results in messy code in lots of places. 2. Create access methods that handle all the messy stuff. I have written the following three methods that we could place in the datasketches/Util.java class that perform the complete operations we use most: /** > * Gets the absolute path of the given resource file's shortName. > * > * <p>Note that the ClassLoader.getResource(shortName) returns a URL, > * which can have special characters, e.g., "%20" for spaces. This method > * obtains the URL, converts it to a URI, then does a uri.getPath(), > which > * decodes any special characters in the URI path. This is required to > make > * obtaining resources operating-system independent.</p> > * > * @param shortName the last name in the pathname's name sequence. > * @return the absolute path of the given resource file's shortName. > */ > public static String getResourcePath(final String shortFileName) { > try { > final URL url = > Resources.class.getClassLoader().getResource(shortFileName); > final URI uri = url.toURI(); > final String path = uri.getPath(); //decodes any special characters > return path; > } catch (final NullPointerException | URISyntaxException e) { > throw new IllegalArgumentException("Cannot find resource: " + > shortFileName + LS + e); > } > } > /** > * Gets the file defined by the given resource file's shortFileName. > * @param shortFileName the last name in the pathname's name sequence. > * @return the file defined by the given resource file's shortFileName. > */ > public static File getResourceFile(final String shortFileName) { > return new File(getResourcePath(shortFileName)); > } > /** > * Returns a byte array of the contents of the file defined by the given > resource file's > * shortFileName. > * @param shortFileName the last name in the pathname's name sequence. > * @return a byte array of the contents of the file defined by the given > resource file's > * shortFileName. > */ > public static byte[] getResourceBytes(final String shortFileName) { > try { > return Files.readAllBytes(Paths.get(getResourcePath(shortFileName))); > } catch (final IOException e) { > throw new IllegalArgumentException("Cannot read resource: " + > shortFileName + LS + e); > } > } With these methods our two major use-cases become: File file = getResourceFile(shortFileName); > byte[] bytes = getResourceBytes(shortFileName); Either way we have to REMEMBER to use the right method, but the 2nd approach is cleaner and makes sure we do all the right checking. The next thing we to decide is what to do about the vote. Should we cancel this vote to fix this issue or wait until the next release? In all the years we have been using this library this issue of spaces in the path name has never come up. Thank you, Furkan for uncovering this, but I could use some help in assessing how important this is to fix in this release. Folks, please respond with your comments: 1. Which fix approach? 2. What to do about the current vote. Thanks, Lee. On Thu, Sep 12, 2019 at 2:34 AM Furkan KAMACI <[email protected]> wrote: > Hi, > > Tests fail if I change the folder name which includes source code. i.e. > rename the folder name from "apache-datasketches-java-1.0.0-incubating-src" > to "a b". I'm not sure whether it is related to my environment or not. > > Kind Regards, > Furkan KAMACI > > On Tue, Sep 10, 2019 at 2:27 AM leerho <[email protected]> wrote: > > > "with a space character?" > > > > On Mon, Sep 9, 2019 at 4:08 PM leerho <[email protected]> wrote: > > > > > Furkan, > > > > > > Thank you for the vote! And for the suggestion to update README. > > > > > > Where did you find the folder name with at space character? > > > > > > Lee > > > > > > On Mon, Sep 9, 2019 at 1:55 PM Furkan KAMACI <[email protected]> > > > wrote: > > > > > >> Hi, > > >> > > >> +1 from me. > > >> > > >> I checked: > > >> - Incubating in name > > >> - DISCLAIMER exists > > >> - LICENSE and NOTICE are fine > > >> - No unexpected binary files > > >> - Checked PGP signatures > > >> - Checked Checksums > > >> - Code compiles and tests successfully run > > >> > > >> Minor things: > > >> > > >> 1) You can update README file at GitHub both for how to compile it and > > >> discussion & support. i.e.: https://github.com/apache/lucene-solr > > >> 2) Seems that project fails to run tests when folder name has space > > >> character. > > >> > > >> Kind Regards, > > >> Furkan KAMACI > > >> > > >> On Fri, Sep 6, 2019 at 10:32 PM Jon Malkin <[email protected]> > > wrote: > > >> > > >> > +1 > > >> > > > >> > Verified signature and checksum > > >> > Licenses/notices present, although I still need to learn details of > > what > > >> > they're supposed to include so I just checked for non-empty files > > >> > Successfully compiled and ran unit tests. > > >> > > > >> > jon > > >> > > > >> > On Thu, Sep 5, 2019 at 6:53 PM leerho <[email protected]> wrote: > > >> > > > >> > > ### NEW COMPONENT! ### > > >> > > > > >> > > Hello Apache DataSketches PPMC and Community, > > >> > > > > >> > > 1. This is a call for vote to release Apache DataSketches-java > > >> version: > > >> > > 1.0.0-incubating-RC2 > > >> > > > > >> > > NOTE 1: This is the core Java component of the DataSketches > > >> library > > >> > > that includes all the sketch algorithms in production-ready > > packages. > > >> > These > > >> > > sketches can be called directly from this component or used in > > >> > conjunction > > >> > > with the adaptor components such as Hadoop Pig, Hadoop Hive, or > the > > >> > > aggregator adaptors built into Apache Druid. > > >> > > > > >> > > NOTE 2: This release contains some critical performance > > >> improvements > > >> > > and bug fixes for Apache Druid. > > >> > > > > >> > > NOTE 3: Changes from RC1: The previous RC1 had a bug in the > > >> release > > >> > > script which caused the SHA512sum to fail. Some very minor updates > > to > > >> the > > >> > > LICENSE, NOTICE, and pom.xml files. Otherwise no code changes. > > >> > > > > >> > > 2. The release candidate: > > >> > > - > > >> > > > > >> > > > > >> > > > >> > > > https://dist.apache.org/repos/dist/dev/incubator/datasketches/java/1.0.0-incubating-RC2/ > > >> > > > > >> > > 3. Source repository: > > >> > > - https://github.com/apache/incubator-datasketches-java > > >> > > > > >> > > Git Tag for this release: > > >> > > - > > >> > > > > >> > > > > >> > > > >> > > > https://github.com/apache/incubator-datasketches-java/tree/1.0.0-incubating-RC2 > > >> > > on branch 1.0.X-incubating > > >> > > > > >> > > Git HashId for this release starts with: f8abbbe > > >> > > > > >> > > The artifacts have been signed with --keyid-format SHORT : > > >> 8CD4A902 > > >> > > > > >> > > 4. Repository: dist.apache.org repository: > > >> > > > > >> > > The public signing key can be found in the KEYS file: > > >> > > - > > >> https://dist.apache.org/repos/dist/dev/incubator/datasketches/KEYS > > >> > > > > >> > > 5. Repository: Maven Central (repository.apache.org): > > >> > > > > >> > > The Jar artifacts can be found at > > >> > > - > > >> > > > > >> > > > > >> > > > >> > > > https://repository.apache.org/content/groups/staging/org/apache/datasketches/datasketches-java/1.0.0-incubating/ > > >> > > > > >> > > 6. Build & Test Guide: (assuming you have Maven) > > >> > > > > >> > > The DataSketches-java component is pure Java and is structured > > as > > >> a > > >> > > Maven project. You must compile using JDK 8. There is one > > run-time > > >> > > dependency of the DataSketches-memory component (recently > released), > > >> and, > > >> > > of course, a number of test and Maven plugin dependencies, all of > > >> which > > >> > can > > >> > > be resolved by Maven. > > >> > > > > >> > > There are two types of tests: normal unit tests and tests run > by > > >> the > > >> > > strict profile. To run normal unit tests: > > >> > > > > >> > > $ mvn clean test > > >> > > > > >> > > To run the strict profile tests: > > >> > > > > >> > > $ mvn clean test -P strict > > >> > > > > >> > > To install jars in your local .m2 repository: > > >> > > > > >> > > $ mvn clean install -DskipTests=true > > >> > > > > >> > > 7. Documentation: The documentation for the DataSketches Java > > >> component > > >> > is > > >> > > part of the website. > > >> > > > > >> > > The Overview section on the website has extensive > documentation > > on > > >> > all > > >> > > the sketches: > > >> > > - https://datasketches.github.io > > >> > > > > >> > > Javadocs: > > >> > > - > > >> > https://datasketches.github.io/api/core/snapshot/apidocs/index.html > > >> > > > > >> > > 8. The vote will be performed in two stages: > > >> > > - This letter will be published on dev@ and remain open for at > > >> least > > >> > 72 > > >> > > hours and at least 3 (+1) PPMC votes or a majority of (+1) are > > >> > > acquired. All PPMC members including Mentors can vote. However, a > > >> > negative > > >> > > vote from a Mentor will cancel this voting process. > > >> > > > > >> > > - After it passes the first stage, the summary of that vote and > > the > > >> > key > > >> > > information from this letter will be published on > general@incubator > > >> and > > >> > > remain open for at least 72 hours and at least 3 (+1) IPMC votes > or > > a > > >> > > majority of (+1) are acquired. > > >> > > > > >> > > Please vote accordingly: > > >> > > > > >> > > [ ] +1 approve > > >> > > [ ] +0 no opinion > > >> > > [ ] -1 disapprove with the reason > > >> > > > > >> > > Thanks, > > >> > > Lee Rhodes > > >> > > [email protected] > > >> > > > > >> > > > >> > > > > > >
