It is worth mentioning that this affects unit tests only, as far as I understand. For instance, to test serialization compatibility with C++ code there are a few files such as src/test/resources/cpc-sparce.bin, which we deserialize in a test to see if it succeeds and the resulting sketch looks right. I don't believe this should stop the release.
On Sun, Sep 15, 2019 at 8:00 PM leerho <[email protected]> wrote: > [DISCUSS] > > Folks, > > The problem that Furkan discovered is a subtle bug that I had not seen > before and affects our Java repositories where we fetch a resource from > either the *src/main/resources* or *src/test/resources* and use the > method *getClass().getClassLoader().getResource(<name>).getFile(), > *which ironically does not return a file, but returns a URL. A URL is what > would be required to obtain a remote resource, but that is not how we have > been using it. > > As a URL it can contain special character sequences, such as "%20", that > are substituted for *spaces *if the user's absolute system path to the > resource contains them. The use of spaces in path names is system > dependent although I think both unix and windows based file systems now > allow it. However, having characters such as "%20" in the place of spaces > in the URL string will clearly fail if the file system attempts to find the > file using the URL string. And this is exactly what Furkan observed. > > Another irony here is that according to Oracle, spaces are not allowed > anywhere in the path to obtain a resource, to wit: > > https://docs.oracle.com/javase/8/docs/technotes/guides/lang/resources.html > > > > > A resource is identified by a string consisting of a sequence of > > substrings, delimited by slashes (/), followed by a resource name. *Each > > substring must be a valid Java identifier*. The resource name is of the > > form shortName or shortName.extension. Both shortName and extension must > be > > Java identifiers. > > > > > The name of a resource is independent of the Java implementation; in > > particular, the path separator is always a slash (/). However, the Java > > implementation controls the details of how the contents of the resource > are > > mapped into a file, database, or other object containing the actual > > resource. > > > > > The interpretation of a resource name is relative to a class loader > > instance. Methods implemented by the ClassLoader class do this > > interpretation. > > > All the methods of the ClassLoader that return path-like strings return > absolute paths. Thus, if the user happens to have directory names with > spaces at any level of the path hierarchy, it would fail this Oracle > requirement. > > This issue and the confusing return of getFile() has been raised as a bug > against Java some years ago, but has never been fixed. > > Instead of > > String path = getClass().getClassLoader().getResource(shortName).getFile(); > > > > the recommended way to obtain the correct file-system usable path is > > String path = > > getClass().getClassLoader().getResource(shortName).toURI().getPath(); > > > This nicely fixes the problem in a system-independent and portable way. The > getPath() method performs a decode operation on the URI and replaces the > "%20" sequences with spaces (as well as fixes any other special characters > in the string. > > I found over 30 places in Memory and Java core where we need to correct > this. I have not yet searched the other Java repos. > > In studying our use cases it appears that we do two things with our > resources, get the file, or get the bytes of the file as a byte array. > > There are at least two ways to go for a fix: > > 1. Do a search / replace on "getResource(...).getFile()" and change it to > "getResource(...).toURI().getPath()". Even with this we have to do > try/catch on a number of exceptions and remember how to efficiently convert > the path to a File or to bytes, etc. This results in messy code in lots of > places. > > 2. Create access methods that handle all the messy stuff. I have written > the following three methods that we could place in the > datasketches/Util.java class that perform the complete operations we use > most: > > /** > > * Gets the absolute path of the given resource file's shortName. > > * > > * <p>Note that the ClassLoader.getResource(shortName) returns a URL, > > * which can have special characters, e.g., "%20" for spaces. This > method > > * obtains the URL, converts it to a URI, then does a uri.getPath(), > > which > > * decodes any special characters in the URI path. This is required to > > make > > * obtaining resources operating-system independent.</p> > > * > > * @param shortName the last name in the pathname's name sequence. > > * @return the absolute path of the given resource file's shortName. > > */ > > public static String getResourcePath(final String shortFileName) { > > try { > > final URL url = > > Resources.class.getClassLoader().getResource(shortFileName); > > final URI uri = url.toURI(); > > final String path = uri.getPath(); //decodes any special characters > > return path; > > } catch (final NullPointerException | URISyntaxException e) { > > throw new IllegalArgumentException("Cannot find resource: " + > > shortFileName + LS + e); > > } > > } > > /** > > * Gets the file defined by the given resource file's shortFileName. > > * @param shortFileName the last name in the pathname's name sequence. > > * @return the file defined by the given resource file's shortFileName. > > */ > > public static File getResourceFile(final String shortFileName) { > > return new File(getResourcePath(shortFileName)); > > } > > /** > > * Returns a byte array of the contents of the file defined by the > given > > resource file's > > * shortFileName. > > * @param shortFileName the last name in the pathname's name sequence. > > * @return a byte array of the contents of the file defined by the > given > > resource file's > > * shortFileName. > > */ > > public static byte[] getResourceBytes(final String shortFileName) { > > try { > > return > Files.readAllBytes(Paths.get(getResourcePath(shortFileName))); > > } catch (final IOException e) { > > throw new IllegalArgumentException("Cannot read resource: " + > > shortFileName + LS + e); > > } > > } > > > With these methods our two major use-cases become: > > File file = getResourceFile(shortFileName); > > byte[] bytes = getResourceBytes(shortFileName); > > > Either way we have to REMEMBER to use the right method, but the 2nd > approach is cleaner and makes sure we do all the right checking. > > The next thing we to decide is what to do about the vote. > > Should we cancel this vote to fix this issue or wait until the next > release? > > In all the years we have been using this library this issue of spaces in > the path name has never come up. Thank you, Furkan for uncovering this, > but I could use some help in assessing how important this is to fix in this > release. > > Folks, please respond with your comments: 1. Which fix approach? 2. What > to do about the current vote. > > Thanks, > > Lee. > > > > > > On Thu, Sep 12, 2019 at 2:34 AM Furkan KAMACI <[email protected]> > wrote: > > > Hi, > > > > Tests fail if I change the folder name which includes source code. i.e. > > rename the folder name from > "apache-datasketches-java-1.0.0-incubating-src" > > to "a b". I'm not sure whether it is related to my environment or not. > > > > Kind Regards, > > Furkan KAMACI > > > > On Tue, Sep 10, 2019 at 2:27 AM leerho <[email protected]> wrote: > > > > > "with a space character?" > > > > > > On Mon, Sep 9, 2019 at 4:08 PM leerho <[email protected]> wrote: > > > > > > > Furkan, > > > > > > > > Thank you for the vote! And for the suggestion to update README. > > > > > > > > Where did you find the folder name with at space character? > > > > > > > > Lee > > > > > > > > On Mon, Sep 9, 2019 at 1:55 PM Furkan KAMACI <[email protected] > > > > > > wrote: > > > > > > > >> Hi, > > > >> > > > >> +1 from me. > > > >> > > > >> I checked: > > > >> - Incubating in name > > > >> - DISCLAIMER exists > > > >> - LICENSE and NOTICE are fine > > > >> - No unexpected binary files > > > >> - Checked PGP signatures > > > >> - Checked Checksums > > > >> - Code compiles and tests successfully run > > > >> > > > >> Minor things: > > > >> > > > >> 1) You can update README file at GitHub both for how to compile it > and > > > >> discussion & support. i.e.: https://github.com/apache/lucene-solr > > > >> 2) Seems that project fails to run tests when folder name has space > > > >> character. > > > >> > > > >> Kind Regards, > > > >> Furkan KAMACI > > > >> > > > >> On Fri, Sep 6, 2019 at 10:32 PM Jon Malkin <[email protected]> > > > wrote: > > > >> > > > >> > +1 > > > >> > > > > >> > Verified signature and checksum > > > >> > Licenses/notices present, although I still need to learn details > of > > > what > > > >> > they're supposed to include so I just checked for non-empty files > > > >> > Successfully compiled and ran unit tests. > > > >> > > > > >> > jon > > > >> > > > > >> > On Thu, Sep 5, 2019 at 6:53 PM leerho <[email protected]> wrote: > > > >> > > > > >> > > ### NEW COMPONENT! ### > > > >> > > > > > >> > > Hello Apache DataSketches PPMC and Community, > > > >> > > > > > >> > > 1. This is a call for vote to release Apache DataSketches-java > > > >> version: > > > >> > > 1.0.0-incubating-RC2 > > > >> > > > > > >> > > NOTE 1: This is the core Java component of the DataSketches > > > >> library > > > >> > > that includes all the sketch algorithms in production-ready > > > packages. > > > >> > These > > > >> > > sketches can be called directly from this component or used in > > > >> > conjunction > > > >> > > with the adaptor components such as Hadoop Pig, Hadoop Hive, or > > the > > > >> > > aggregator adaptors built into Apache Druid. > > > >> > > > > > >> > > NOTE 2: This release contains some critical performance > > > >> improvements > > > >> > > and bug fixes for Apache Druid. > > > >> > > > > > >> > > NOTE 3: Changes from RC1: The previous RC1 had a bug in the > > > >> release > > > >> > > script which caused the SHA512sum to fail. Some very minor > updates > > > to > > > >> the > > > >> > > LICENSE, NOTICE, and pom.xml files. Otherwise no code changes. > > > >> > > > > > >> > > 2. The release candidate: > > > >> > > - > > > >> > > > > > >> > > > > > >> > > > > >> > > > > > > https://dist.apache.org/repos/dist/dev/incubator/datasketches/java/1.0.0-incubating-RC2/ > > > >> > > > > > >> > > 3. Source repository: > > > >> > > - https://github.com/apache/incubator-datasketches-java > > > >> > > > > > >> > > Git Tag for this release: > > > >> > > - > > > >> > > > > > >> > > > > > >> > > > > >> > > > > > > https://github.com/apache/incubator-datasketches-java/tree/1.0.0-incubating-RC2 > > > >> > > on branch 1.0.X-incubating > > > >> > > > > > >> > > Git HashId for this release starts with: f8abbbe > > > >> > > > > > >> > > The artifacts have been signed with --keyid-format SHORT : > > > >> 8CD4A902 > > > >> > > > > > >> > > 4. Repository: dist.apache.org repository: > > > >> > > > > > >> > > The public signing key can be found in the KEYS file: > > > >> > > - > > > >> https://dist.apache.org/repos/dist/dev/incubator/datasketches/KEYS > > > >> > > > > > >> > > 5. Repository: Maven Central (repository.apache.org): > > > >> > > > > > >> > > The Jar artifacts can be found at > > > >> > > - > > > >> > > > > > >> > > > > > >> > > > > >> > > > > > > https://repository.apache.org/content/groups/staging/org/apache/datasketches/datasketches-java/1.0.0-incubating/ > > > >> > > > > > >> > > 6. Build & Test Guide: (assuming you have Maven) > > > >> > > > > > >> > > The DataSketches-java component is pure Java and is > structured > > > as > > > >> a > > > >> > > Maven project. You must compile using JDK 8. There is one > > > run-time > > > >> > > dependency of the DataSketches-memory component (recently > > released), > > > >> and, > > > >> > > of course, a number of test and Maven plugin dependencies, all > of > > > >> which > > > >> > can > > > >> > > be resolved by Maven. > > > >> > > > > > >> > > There are two types of tests: normal unit tests and tests > run > > by > > > >> the > > > >> > > strict profile. To run normal unit tests: > > > >> > > > > > >> > > $ mvn clean test > > > >> > > > > > >> > > To run the strict profile tests: > > > >> > > > > > >> > > $ mvn clean test -P strict > > > >> > > > > > >> > > To install jars in your local .m2 repository: > > > >> > > > > > >> > > $ mvn clean install -DskipTests=true > > > >> > > > > > >> > > 7. Documentation: The documentation for the DataSketches Java > > > >> component > > > >> > is > > > >> > > part of the website. > > > >> > > > > > >> > > The Overview section on the website has extensive > > documentation > > > on > > > >> > all > > > >> > > the sketches: > > > >> > > - https://datasketches.github.io > > > >> > > > > > >> > > Javadocs: > > > >> > > - > > > >> > > https://datasketches.github.io/api/core/snapshot/apidocs/index.html > > > >> > > > > > >> > > 8. The vote will be performed in two stages: > > > >> > > - This letter will be published on dev@ and remain open for > at > > > >> least > > > >> > 72 > > > >> > > hours and at least 3 (+1) PPMC votes or a majority of (+1) are > > > >> > > acquired. All PPMC members including Mentors can vote. However, > a > > > >> > negative > > > >> > > vote from a Mentor will cancel this voting process. > > > >> > > > > > >> > > - After it passes the first stage, the summary of that vote > and > > > the > > > >> > key > > > >> > > information from this letter will be published on > > general@incubator > > > >> and > > > >> > > remain open for at least 72 hours and at least 3 (+1) IPMC votes > > or > > > a > > > >> > > majority of (+1) are acquired. > > > >> > > > > > >> > > Please vote accordingly: > > > >> > > > > > >> > > [ ] +1 approve > > > >> > > [ ] +0 no opinion > > > >> > > [ ] -1 disapprove with the reason > > > >> > > > > > >> > > Thanks, > > > >> > > Lee Rhodes > > > >> > > [email protected] > > > >> > > > > > >> > > > > >> > > > > > > > > > >
