Re: [geometry] IO Modules

Matt Juntunen Thu, 21 Jan 2021 18:38:42 -0800

Hi Gilles,

> Really, the main point is to separate format (contents) from filename 
> (container).


This makes sense. What would you think of the approach below? This would 
separate the format name from the file extension(s) and provide an enum 
containing default format information and handlers. Usage of the enum would be 
optional since there would still be overloads that accept a simple format name 
string. For the BoundaryIOManager methods that accept a Path or URL, the format 
would still be determined by the file extension. If users want to use a 
non-standard file extension, they can open the IO stream themselves and use the 
read/write methods that accept an IO stream and format string name or Format 
instance.

    interface Format {
        String getName();
        List<String> getFileExtensions();
    }

    class BoundaryIOManager {
        void register(BoundaryFormat fmt, BoundaryReadHandler rh, 
BoundaryWriteHandler wh) {
            register(fmt.getName(), fmt.getFileExtensions(), rh, wh);
        }
        void register(String formatName, List<String> extensions, 
BoundaryReadHandler rh, BoundaryWriteHandler wh) {...}

        // ...

        void write(BoundarySource src, OutputStream out, Format fmt) {
            write(src, in, fmt.getName());
        }
        void write(BoundarySource src, OutputStream out, String formatName) 
{...}

        // similar read methods ...
    }

    enum StandardFormat3D implements Format {
        OBJ(...),
        TXT(...),
        CSV(...);

        public String getName() {...}
        public List<String> getFileExtensions() {...}
        public BoundaryReadHandler3D readHandler() { (execute a supplier 
function)... }
        public BoundaryWriteHandler3D writeHandler() { (execute a supplier 
function)... }
    }

> The "enum" is for natively supported formats to allow for simple API, while 
> "hiding" the actual implementations (as in "RandomSource" from "Commons RNG").

I'd prefer to not hide the format-specific classes, at least not completely. 
For example, the OBJ file format can contain a lot more information than just 
pure geometry, such as object names (more than one geometry can be contained in 
a single file), material information (for use in rendering), free-form curve 
definitions, etc. This information is not used to produce BoundarySource3D or 
Mesh instances but it can be accessed easily by extending AbstractOBJParser or 
PolygonOBJParser. Also, additional information such as comments and object 
names can be included in output files if the OBJWriter class is used directly, 
as opposed to IO3D or BoundaryIOManager3D. It seems like a waste to completely 
hide this functionality.

Another reason to keep these classes public is that they may need to be 
accessed in order to configure them. For example, the txt, csv, and obj formats 
use a default format pattern for writing floating point numbers as text. If 
this needs to be modified, for example to increase or decrease the number of 
minimum fraction digits, then the format-specific type will need to be 
accessed. The code below shows how to set a custom decimal format for OBJ files 
(using the current code).

    OBJBoundaryWriteHandler3D wh = new OBJBoundaryWriteHandler3D();
    wh.setDecimalFormatPattern("0.0##");

    IO3D.getDefaultManager().registerWriteHandler("obj", wh);

One additional question that I thought of while looking at your example code: 
what is our convention for class names that contain acronyms or other sequences 
of capitalized letters? In other words, should it be OBJWriter or ObjWriter?

Regards,
Matt J

________________________________
From: Gilles Sadowski <gillese...@gmail.com>
Sent: Thursday, January 21, 2021 7:40 AM
To: Commons Developers List <dev@commons.apache.org>
Subject: Re: [geometry] IO Modules

Hello.

Le mer. 20 janv. 2021 à 23:55, Matt Juntunen
<matt.juntu...@hotmail.com> a écrit :
>
> Hi Gilles,
>
> I've updated the PR with the new module/package names.
>
> > I don't see the link between "(not) extensible" and "enum": Extensibility 
> > is provided by API (which classes are public and meant to be reused, e.g. 
> > by custom IO code) while the "enum" defines the file formats that are 
> > "natively" supported by this library.
>
> I might not be picturing this correctly so perhaps a code example would help. 
> Here is what I'd like to be able to do with the IO code:
>
> // add custom handler for my own file format
> IO3D.getDefaultManager().registerReadHandler("matt", new 
> MattFileReadHandler());
>
> // read a file using that format
> BoundarySource3D result = IO3D.read(Paths.get("mygeometry.matt"));
>
> I don't see how the above could be accomplished if the supported formats are 
> in an enum.

The "enum" is for natively supported formats to allow for simple API, while
"hiding" the actual implementations (as in "RandomSource" from "Commons
RNG").
Really, the main point is to separate format (contents) from filename
(container).

// Library code.

/** Native file formats. */
enum FileFormat {
    // "ObjFileReadHandler" and "ObjFileWriteHandler" must be "internal".
    OBJ(new ObjFileReadHandler(), new ObjFileWriteHandler());

    private final FileReadHandler rh;
    private final FileReadHandler wh;

    FileFormat(FileReadHandler rh,
                      FileWriteHandle wh) {
        this.rh = rh;
        this.wh = wh;
    }

    // Methods to support "register" and "registerSuffix" (see below).
}

// Method does not exist yet.
IO3D.getDefaultManager().register(FileFormat.OBJ);

// In user code

// Choose format, and file name (arbitrary).
IO3D.write(src, filename, FileFormat.OBJ);

// Associate user-preferred suffix(es) to a natively supported file format...
IO3D.getDefaultManager().registerSuffix(FileFormat.OBJ, ".obj",
".OBJ"); // Method does not exist yet.
// .. and the library will add the default suffix (".obj").
IO3D.write(src, filenamePrefix, FileFormat.OBJ);

// User-defined handlers.
final String id = IO3D.getDefaultManager().register(new
MattFileReadHandler(), new MattFileWriteHandler()); // Method does not
exist yet.
IO3D.getDefaultManager().registerSuffix(id, "matt");

> Also, the file extension approach above is similar to that used by 
> javax.imageio.ImageIO.

IIUC, they also make the difference pointed out above:
https://docs.oracle.com/javase/7/docs/api/javax/imageio/ImageIO.html#getImageReadersByMIMEType(java.lang.String)
https://docs.oracle.com/javase/7/docs/api/javax/imageio/ImageIO.html#getImageReadersBySuffix(java.lang.String)

Sorry if I've cut some corners...

Regards,
Gilles

>
> Regards,
> Matt
>
> ________________________________
> From: Gilles Sadowski <gillese...@gmail.com>
> Sent: Wednesday, January 20, 2021 5:28 PM
> To: Commons Developers List <dev@commons.apache.org>
> Subject: Re: [geometry] IO Modules
>
> Le mer. 20 janv. 2021 à 22:42, Matt Juntunen
> <matt.juntu...@hotmail.com> a écrit :
> >
> > Hi Gilles,
> >
> > > package "io" does not belong in "core".
> >
> > Ah, yes. I'd like to keep the core and euclidean IO modules separate since 
> > I'm picturing a spherical IO module later on down the road. In that case, 
> > perhaps we could just put the "io" portion of the module/package name 
> > before everything else? That would give us
> >
> >     commons-geometry-io-core - contains package o.a.c.geometry.io.core
> >     commons-geometry-io-euclidean - contains package 
> > o.a.c.geometry.io.euclidean
>
> Yes; having "io" as the top-level package is what I was suggesting.
>
> >
> > > Then, we had also discussed that it would have been more robust to use a 
> > > parser generator...
> >
> > Yes. I looked into that and ended up deciding that a full parser-generator 
> > would be overkill for what I wanted to do and possibly more work to 
> > maintain/customize. I've placed the current low-level parsing code 
> > (consisting of the CharReadBuffer and SimpleTextParser classes) in the 
> > "internal" package o.a.c.geometry.core.io.internal. I don't intend for 
> > these classes to be part of the public API, which would allow us to switch 
> > to a parser-generator later if needed.
>
> OK!
>
> >
> > > I haven't read all the code (sorry!)
> >
> > No worries. There is a lot of it :-)
> >
> > > I'm not convinced by the usage of file "extensions" to determine how the 
> > > contents should be parsed.  At first sight, an "enum"-based approach[3] 
> > > would be more flexible and clearer from a code design POV.
> >
> > In contrast to the rest of the library, I'd like to make these IO modules 
> > as extensible as possible, the reason being that they form the main 
> > connection between the library and external systems. If we use an enum to 
> > indicate the format, we restrict usage of the API to only those formats. A 
> > string name, on the other hand, allows users to define their own formats 
> > and use them with the API.
>
> [I'm only answering based on the impression I get from your description 
> above.]
> I don't see the link between "(not) extensible" and "enum":
> Extensibility is provided by
> API (which classes are public and meant to be reused, e.g. by custom
> IO code) while
> the "enum" defines the file formats that are "natively" supported by
> this library.
> [And, in time, custom code can become part of the supported formats and get 
> its
> corresponding "enum" element.]
> The file extensions could be supported too, but I would think at a higher 
> level;
> mapping to either the "enum", or to custom code.
>
> > For example, we will soon be creating a custom geometry format for an 
> > application at my day job and I would like to be able to use these IO 
> > modules to read and write that format (using custom BoundaryReadHandler3D 
> > and BoundaryWriteHandler3D implementations).
>
> Is what I wrote above preventing that?
>
> Best,
> Gilles
>
> >
> > Thoughts?
> >
> > Regards,
> > Matt J
> >
> >
> >
> > ________________________________
> > From: Gilles Sadowski <gillese...@gmail.com>
> > Sent: Wednesday, January 20, 2021 9:17 AM
> > To: Commons Developers List <dev@commons.apache.org>
> > Subject: Re: [geometry] IO Modules
> >
> > Hi Matt.
> >
> > Le mer. 20 janv. 2021 à 05:03, Matt Juntunen
> > <matt.juntu...@hotmail.com> a écrit :
> > >
> > > Hello,
> > >
> > > I've created GEOMETRY-115 and an associated PR [1] containing new modules 
> > > for IO functionality. The new modules are
> > >
> > >   *   commons-geometry-core-io - Common space-independent interfaces and 
> > > classes
> > >   *   commons-geometry-euclidean-io - Euclidean IO classes; currently 
> > > contains support for the 3D formats TXT, CSV, and OBJ
> > >
> > > The API is based on a core BoundaryIOManager class that delegates to 
> > > BoundaryReadHandler and BoundaryWriteHandler implementations based on the 
> > > requested data format. For Euclidean 3D space, a convenience IO3D class 
> > > is provided with static methods that delegate to a default manager 
> > > instance. In addition to reading and writing the core geometric types for 
> > > the library (ConvexPolygon3D, Triangle3D), the Euclidean module also 
> > > supports reading and writing a FacetDefinition interface, which exposes 
> > > simple, unvalidated geometric data. This is intended for accessing raw 
> > > (possibly invalid) geometric data from files and writing data contained 
> > > in external data structures (for example, a custom facet class). The 
> > > example below is from the IO3D class documentation and demonstrates a 
> > > read, transform, write operation using streams.
> > >
> > >         final Path origFile = tempDir.resolve("orig.obj");
> > >         final Path scaledFile = tempDir.resolve("scaled.csv");
> > >
> > >         final DoublePrecisionContext precision = new 
> > > EpsilonDoublePrecisionContext(1e-10);
> > >         final BoundarySource3D src = Parallelepiped.unitCube(precision);
> > >
> > >         IO3D.write(src, origFile);
> > >
> > >         final AffineTransformMatrix3D transform = 
> > > AffineTransformMatrix3D.createScale(2);
> > >
> > >         try (Stream<Triangle3D> stream = IO3D.triangles(origFile, 
> > > precision)) {
> > >             IO3D.write(stream.map(t -> t.transform(transform)), 
> > > scaledFile);
> > >         }
> > >
> > > Feedback is welcome.
> >
> > The high-level API, illustrated in the example above, looks quite fine.
> > But we should enforce "separation of concerns"; in this particular case:
> > package "io" does not belong in "core".
> >
> > The Java package
> >   org.apache.commons.geometry.core
> > is defined the within the (maven) module
> >   commons-geometry-core
> > The new Java package
> >   org.apache.commons.geometry.core.io
> > is defined the within the (maven) module
> >   commons-geometry-core-io
> >
> > The (maven) module name is also often used to define the
> > Java 9+ module name for JPMS.[1]
> > IIUC, the package
> >   org.apache.commons.geometry.core
> > and its "sub-package"
> >   org.apache.commons.geometry.core.io
> > cannot belong to different (JPMS) modules, because (IIUC) a package
> > hierarchy cannot be shared across modules (since one of the purposes
> > is to organize "visibility").
> > Components should be JPMS-compatible.[2]
> >
> > Fixing that would just require to
> > 1. remove the two modules introduced in PR #130
> >   commons-geometry-core-io
> >   commons-geometry-euclidean-io
> > 2. create one (maven) module
> >   commons-geometry-io
> > owning the Java packages
> >   org.apache.commons.geometry.io
> >   org.apache.commons.geometry.io.core
> >   org.apache.commons.geometry.io.euclidean
> >
> > [If you think that it's really worth having separate modules for loading
> > "euclidean" data vs other space-types (not yet implemented IIUC),
> > point (2) would be adapted accordingly.]
> >
> > This would fix both the JPMS issue, and the design issue.
> >
> > Then, we had also discussed that it would have been more robust to
> > use a parser generator...
> > In either case, I'd suggest that we put the actual I/O classes in an
> > "internal" package, and only advertize the above high-level API (i.e. we
> > should not be expected to maintain compatibility of the parser code).
> >
> > I haven't read all the code (sorry!), but in class "IO3D", I'm not convinced
> > by the usage of file "extensions" to determine how the contents should be
> > parsed.  At first sight, an "enum"-based approach[3] would be more flexible
> > and clearer from a code design POV.
> >
> > Regards,
> > Gilles
> >
> > [1] 
> > https://www.oracle.com/corporate/features/understanding-java-9-modules.html
> > [2] See 
> > https://gitbox.apache.org/repos/asf?p=commons-rng.git;a=tree;f=commons-rng-examples/examples-jpms;hb=HEAD
> > [3] See 
> > https://gitbox.apache.org/repos/asf?p=commons-rng.git;a=blob;f=commons-rng-simple/src/main/java/org/apache/commons/rng/simple/RandomSource.java;h=65ca02fdc285fbb7ea8305008dbce21f571191d7;hb=HEAD
> >
> > >
> > > Regards,
> > > Matt J
> > >
> > > [1] https://github.com/apache/commons-geometry/pull/130
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> For additional commands, e-mail: dev-h...@commons.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [geometry] IO Modules

Reply via email to