It is worth mentioning that this affects unit tests only, as far as I
understand.
For instance, to test serialization compatibility with C++ code there are a
few files such as src/test/resources/cpc-sparce.bin, which we deserialize
in a test to see if it succeeds and the resulting sketch looks right.
I don't believe this should stop the release.

On Sun, Sep 15, 2019 at 8:00 PM leerho <[email protected]> wrote:

> [DISCUSS]
>
> Folks,
>
> The problem that Furkan discovered is a subtle bug that I had not seen
> before and affects our Java repositories where we fetch a resource from
> either the *src/main/resources* or *src/test/resources* and use the
> method *getClass().getClassLoader().getResource(<name>).getFile(),
> *which ironically does not return a file, but returns a URL.  A URL is what
> would be required to obtain a remote resource, but that is not how we have
> been using it.
>
> As a URL it can contain special character sequences, such as "%20", that
> are substituted for *spaces *if the user's absolute system path to the
> resource contains them.  The use of spaces in path names is system
> dependent although I think both unix and windows based file systems now
> allow it. However, having characters such as "%20" in the place of spaces
> in the URL string will clearly fail if the file system attempts to find the
> file using the URL string.  And this is exactly what Furkan observed.
>
> Another irony here is that according to Oracle, spaces are not allowed
> anywhere in the path to obtain a resource, to wit:
>
> https://docs.oracle.com/javase/8/docs/technotes/guides/lang/resources.html
> >
>
>
> A resource is identified by a string consisting of a sequence of
> > substrings, delimited by slashes (/), followed by a resource name. *Each
> > substring must be a valid Java identifier*. The resource name is of the
> > form shortName or shortName.extension. Both shortName and extension must
> be
> > Java identifiers.
> >
>
>
> The name of a resource is independent of the Java implementation; in
> > particular, the path separator is always a slash (/). However, the Java
> > implementation controls the details of how the contents of the resource
> are
> > mapped into a file, database, or other object containing the actual
> > resource.
> >
>
>
> The interpretation of a resource name is relative to a class loader
> > instance. Methods implemented by the ClassLoader class do this
> > interpretation.
>
>
> All the methods of the ClassLoader that return path-like strings return
> absolute paths.  Thus, if the user happens to have directory names with
> spaces at any level of the path hierarchy, it would fail this Oracle
> requirement.
>
> This issue and the confusing return of getFile() has been raised as a bug
> against Java some years ago, but has never been fixed.
>
> Instead of
>
> String path = getClass().getClassLoader().getResource(shortName).getFile();
> >
>
> the recommended way to obtain the correct file-system usable path is
>
>  String path =
> > getClass().getClassLoader().getResource(shortName).toURI().getPath();
>
>
> This nicely fixes the problem in a system-independent and portable way. The
> getPath() method performs a decode operation on the URI and replaces the
> "%20" sequences with spaces (as well as fixes any other special characters
> in the string.
>
> I found over 30 places in Memory and Java core where we need to correct
> this.  I have not yet searched the other Java repos.
>
> In studying our use cases it appears that we do two things with our
> resources, get the file, or get the bytes of the file as a byte array.
>
> There are at least two ways to go for a fix:
>
> 1. Do a search / replace on "getResource(...).getFile()" and change it to
> "getResource(...).toURI().getPath()".  Even with this we have to do
> try/catch on a number of exceptions and remember how to efficiently convert
> the path to a File or to bytes, etc.  This results in messy code in lots of
> places.
>
> 2. Create access methods that handle all the messy stuff.  I have written
> the following three methods that we could place in the
> datasketches/Util.java class that perform the complete operations we use
> most:
>
>   /**
> >    * Gets the absolute path of the given resource file's shortName.
> >    *
> >    * <p>Note that the ClassLoader.getResource(shortName) returns a URL,
> >    * which can have special characters, e.g., "%20" for spaces. This
> method
> >    * obtains the URL, converts it to a URI, then does a uri.getPath(),
> > which
> >    * decodes any special characters in the URI path. This is required to
> > make
> >    * obtaining resources operating-system independent.</p>
> >    *
> >    * @param shortName the last name in the pathname's name sequence.
> >    * @return the absolute path of the given resource file's shortName.
> >    */
> >   public static String getResourcePath(final String shortFileName) {
> >     try {
> >       final URL url =
> > Resources.class.getClassLoader().getResource(shortFileName);
> >       final URI uri = url.toURI();
> >       final String path = uri.getPath(); //decodes any special characters
> >       return path;
> >     } catch (final NullPointerException | URISyntaxException e) {
> >       throw new IllegalArgumentException("Cannot find resource: " +
> > shortFileName + LS + e);
> >     }
> >   }
> >   /**
> >    * Gets the file defined by the given resource file's shortFileName.
> >    * @param shortFileName the last name in the pathname's name sequence.
> >    * @return the file defined by the given resource file's shortFileName.
> >    */
> >   public static File getResourceFile(final String shortFileName) {
> >     return new File(getResourcePath(shortFileName));
> >   }
> >   /**
> >    * Returns a byte array of the contents of the file defined by the
> given
> > resource file's
> >    * shortFileName.
> >    * @param shortFileName the last name in the pathname's name sequence.
> >    * @return a byte array of the contents of the file defined by the
> given
> > resource file's
> >    *     shortFileName.
> >    */
> >   public static byte[] getResourceBytes(final String shortFileName) {
> >     try {
> >       return
> Files.readAllBytes(Paths.get(getResourcePath(shortFileName)));
> >     } catch (final IOException e) {
> >       throw new IllegalArgumentException("Cannot read resource: " +
> > shortFileName + LS + e);
> >     }
> >   }
>
>
> With these methods our two major use-cases become:
>
> File file = getResourceFile(shortFileName);
> > byte[] bytes = getResourceBytes(shortFileName);
>
>
> Either way we have to REMEMBER to use the right method, but the 2nd
> approach is cleaner and makes sure we do all the right checking.
>
> The next thing we to decide is what to do about the vote.
>
> Should we cancel this vote to fix this issue or wait until the next
> release?
>
> In all the years we have been using this library this issue of spaces in
> the path name has never come up.  Thank you, Furkan for uncovering this,
> but I could use some help in assessing how important this is to fix in this
> release.
>
> Folks, please respond with your comments:  1. Which fix approach?  2. What
> to do about the current vote.
>
> Thanks,
>
> Lee.
>
>
>
>
>
> On Thu, Sep 12, 2019 at 2:34 AM Furkan KAMACI <[email protected]>
> wrote:
>
> > Hi,
> >
> > Tests fail if I change the folder name which includes source code. i.e.
> > rename the folder name from
> "apache-datasketches-java-1.0.0-incubating-src"
> > to "a b". I'm not sure whether it is related to my environment or not.
> >
> > Kind Regards,
> > Furkan KAMACI
> >
> > On Tue, Sep 10, 2019 at 2:27 AM leerho <[email protected]> wrote:
> >
> > > "with a space character?"
> > >
> > > On Mon, Sep 9, 2019 at 4:08 PM leerho <[email protected]> wrote:
> > >
> > > > Furkan,
> > > >
> > > > Thank you for the vote!  And for the suggestion to update README.
> > > >
> > > > Where did you find the folder name with at space character?
> > > >
> > > > Lee
> > > >
> > > > On Mon, Sep 9, 2019 at 1:55 PM Furkan KAMACI <[email protected]
> >
> > > > wrote:
> > > >
> > > >> Hi,
> > > >>
> > > >> +1 from me.
> > > >>
> > > >> I checked:
> > > >> - Incubating in name
> > > >> - DISCLAIMER exists
> > > >> - LICENSE and NOTICE are fine
> > > >> - No unexpected binary files
> > > >> - Checked PGP signatures
> > > >> - Checked Checksums
> > > >> - Code compiles and tests successfully run
> > > >>
> > > >> Minor things:
> > > >>
> > > >> 1) You can update README file at GitHub both for how to compile it
> and
> > > >> discussion & support. i.e.: https://github.com/apache/lucene-solr
> > > >> 2) Seems that project fails to run tests when folder name has space
> > > >> character.
> > > >>
> > > >> Kind Regards,
> > > >> Furkan KAMACI
> > > >>
> > > >> On Fri, Sep 6, 2019 at 10:32 PM Jon Malkin <[email protected]>
> > > wrote:
> > > >>
> > > >> > +1
> > > >> >
> > > >> > Verified signature and checksum
> > > >> > Licenses/notices present, although I still need to learn details
> of
> > > what
> > > >> > they're supposed to include so I just checked for non-empty files
> > > >> > Successfully compiled and ran unit tests.
> > > >> >
> > > >> >   jon
> > > >> >
> > > >> > On Thu, Sep 5, 2019 at 6:53 PM leerho <[email protected]> wrote:
> > > >> >
> > > >> > > ### NEW COMPONENT! ###
> > > >> > >
> > > >> > > Hello Apache DataSketches PPMC and Community,
> > > >> > >
> > > >> > > 1. This is a call for vote to release Apache DataSketches-java
> > > >> version:
> > > >> > >  1.0.0-incubating-RC2
> > > >> > >
> > > >> > >     NOTE 1: This is the core Java component of the DataSketches
> > > >> library
> > > >> > > that includes all the sketch algorithms in production-ready
> > > packages.
> > > >> > These
> > > >> > > sketches can be called directly from this component or used in
> > > >> > conjunction
> > > >> > > with the adaptor components such as Hadoop Pig, Hadoop Hive, or
> > the
> > > >> > > aggregator adaptors built into Apache Druid.
> > > >> > >
> > > >> > >     NOTE 2: This release contains some critical performance
> > > >> improvements
> > > >> > > and bug fixes for Apache Druid.
> > > >> > >
> > > >> > >     NOTE 3: Changes from RC1: The previous RC1 had a bug in the
> > > >> release
> > > >> > > script which caused the SHA512sum to fail. Some very minor
> updates
> > > to
> > > >> the
> > > >> > > LICENSE, NOTICE, and pom.xml files. Otherwise no code changes.
> > > >> > >
> > > >> > > 2. The release candidate:
> > > >> > >     -
> > > >> > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> https://dist.apache.org/repos/dist/dev/incubator/datasketches/java/1.0.0-incubating-RC2/
> > > >> > >
> > > >> > > 3. Source repository:
> > > >> > >     - https://github.com/apache/incubator-datasketches-java
> > > >> > >
> > > >> > >     Git Tag for this release:
> > > >> > >     -
> > > >> > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> https://github.com/apache/incubator-datasketches-java/tree/1.0.0-incubating-RC2
> > > >> > > on branch 1.0.X-incubating
> > > >> > >
> > > >> > >     Git HashId for this release starts with: f8abbbe
> > > >> > >
> > > >> > >     The artifacts have been signed with --keyid-format SHORT :
> > > >> 8CD4A902
> > > >> > >
> > > >> > > 4. Repository: dist.apache.org repository:
> > > >> > >
> > > >> > >     The public signing key can be found in the KEYS file:
> > > >> > >     -
> > > >> https://dist.apache.org/repos/dist/dev/incubator/datasketches/KEYS
> > > >> > >
> > > >> > > 5. Repository: Maven Central (repository.apache.org):
> > > >> > >
> > > >> > >     The Jar artifacts can be found at
> > > >> > >     -
> > > >> > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> https://repository.apache.org/content/groups/staging/org/apache/datasketches/datasketches-java/1.0.0-incubating/
> > > >> > >
> > > >> > > 6. Build & Test Guide: (assuming you have Maven)
> > > >> > >
> > > >> > >     The DataSketches-java component is pure Java and is
> structured
> > > as
> > > >> a
> > > >> > > Maven project.  You must compile using JDK 8.   There is one
> > > run-time
> > > >> > > dependency of the DataSketches-memory component (recently
> > released),
> > > >> and,
> > > >> > > of course, a number of test and Maven plugin dependencies, all
> of
> > > >> which
> > > >> > can
> > > >> > > be resolved by Maven.
> > > >> > >
> > > >> > >     There are two types of tests: normal unit tests and tests
> run
> > by
> > > >> the
> > > >> > > strict profile.  To run normal unit tests:
> > > >> > >
> > > >> > >       $ mvn clean test
> > > >> > >
> > > >> > >     To run the strict profile tests:
> > > >> > >
> > > >> > >       $ mvn clean test -P strict
> > > >> > >
> > > >> > >     To install jars in your local .m2 repository:
> > > >> > >
> > > >> > >       $ mvn clean install -DskipTests=true
> > > >> > >
> > > >> > > 7. Documentation: The documentation for the DataSketches Java
> > > >> component
> > > >> > is
> > > >> > > part of the website.
> > > >> > >
> > > >> > >     The Overview section on the website has extensive
> > documentation
> > > on
> > > >> > all
> > > >> > > the sketches:
> > > >> > >     - https://datasketches.github.io
> > > >> > >
> > > >> > >     Javadocs:
> > > >> > >     -
> > > >> >
> https://datasketches.github.io/api/core/snapshot/apidocs/index.html
> > > >> > >
> > > >> > > 8. The vote will be performed in two stages:
> > > >> > >    - This letter will be published on dev@ and remain open for
> at
> > > >> least
> > > >> > 72
> > > >> > > hours and at least 3 (+1) PPMC votes or a majority of (+1) are
> > > >> > > acquired. All PPMC members including Mentors can vote. However,
> a
> > > >> > negative
> > > >> > > vote from a Mentor will cancel this voting process.
> > > >> > >
> > > >> > >    - After it passes the first stage, the summary of that vote
> and
> > > the
> > > >> > key
> > > >> > > information from this letter will be published on
> > general@incubator
> > > >> and
> > > >> > > remain open for at least 72 hours and at least 3 (+1) IPMC votes
> > or
> > > a
> > > >> > > majority of (+1) are acquired.
> > > >> > >
> > > >> > > Please vote accordingly:
> > > >> > >
> > > >> > > [ ] +1 approve
> > > >> > > [ ] +0 no opinion
> > > >> > > [ ] -1 disapprove with the reason
> > > >> > >
> > > >> > > Thanks,
> > > >> > > Lee Rhodes
> > > >> > > [email protected]
> > > >> > >
> > > >> >
> > > >>
> > > >
> > >
> >
>

Reply via email to