Come to think of it, clone() is going to make a shallow copy anyways, so any RDFParser fields here wouldn't get copied between clones, just references.
On 9 February 2017 at 03:44, sebb <[email protected]> wrote: > I've not looked at the code itself yet, but AFAIK clone() is not > sufficient to ensure thread-safety of a mutable object. > > This is a consequence of the Java memory model, which allows items to > be cached locally. > Simply put, safe publication of values requires that the reader of a > field synchronise on the same lock as a the writer that updated it. > Otherwise the reader may never see the updated value (it may remain in > the writer's cache). > > > On 9 February 2017 at 06:37, Matt Sicker <[email protected]> wrote: > > Looking more closely, why not use CompleteableFuture instead of Future > > since this is Java 8? > > > > Also, the gist of the configuration API you're looking for sounds like a > > way to emulate closures in Java. :) > > > > You could also add a freeze method to switch between mutable and > immutable > > views of the factory internally to support a sort of deferred parameter > > setting for thread safety purposes (i.e., cloning mutable objects). For > > example: > > > > MockRDFParser parser = new MockRDFParser().foo(foo).bar(bar).freeze(); > > // at this point, any calls to a mutator will return a mutable clone > until > > frozen again > > > > parser.path(path).syntax(syntax).sink(sink).parse(); // thread 1 > > parser.path(path).syntax(syntax).sink(sink).parse(); // thread 2 > > > > Now both parsers should still work. For immutable, threadsafe objects, > you > > don't even have to clone them in the thawed clones. The naming could use > > some work, but I think you get the idea. > > > > I'm also kind of thinking the three different source types could be > unified > > into their own Source kind of class since the use of any of them are > > mutually exclusive. > > > > On 8 February 2017 at 23:18, Matt Sicker <[email protected]> wrote: > > > >> I'm not familiar with the code, but it sounds like you're in the early > >> stages of a plugin system. The good old BeanFactory. > >> > >> Another possible way to go about untying thread safe and not thread safe > >> parser factory builders would be using naming conventions like setters > for > >> mutable and withers for immutable (copy on write) APIs. > >> > >> On 8 February 2017 at 22:58, Gary Gregory <[email protected]> > wrote: > >> > >>> On Wed, Feb 8, 2017 at 7:27 PM, Matt Sicker <[email protected]> wrote: > >>> > >>> > I've always considered builders to be mutable, thread unsafe objects > >>> used > >>> > to create immutable, thread safe objects. > >>> > >>> > >>> +1 > >>> > >>> Gary > >>> > >>> > >>> > If these builders cause GC > >>> > pressure to go too high, then I'd turn to object pooling or > per-thread > >>> > reusable objects depending on how the code is used. > >>> > > >>> > On 8 February 2017 at 20:38, Stian Soiland-Reyes <[email protected]> > >>> wrote: > >>> > > >>> > > Peter Ansell raised a valid question about why in Commons RDF I > made > >>> the > >>> > > RDFParser interface as an immutable factory-like builder pattern. > >>> > > > >>> > > https://commons.apache.org/proper/commons-rdf/apidocs/ > >>> > > org/apache/commons/rdf/experimental/RDFParser.html > >>> > > > >>> > > > >>> > > Here is how it can be used today: > >>> > > > >>> > > RDF rdf = new JenaRDF(); > >>> > > Graph g1 = rdf.createGraph(); > >>> > > Future future = new JenaRDFParser() > >>> > > .source(Paths.get("/tmp/graph.ttl")) > >>> > > .contentType(RDFSyntax.TURTLE) > >>> > > .target(g1).parse(); > >>> > > future.get(30, TimeUnit.Seconds); // block > >>> > > > >>> > > > >>> > > You may notice it is not a factory as-such, as you only get a > Future > >>> > > of the parser result, rather than the actual "parser". Also it is > >>> > > currently an interface, with multiple implementations (one per RDF > >>> > > implementation). > >>> > > > >>> > > > >>> > > > >>> > > Although the javadoc only suggests for it to be immutable and > >>> > > threadsafe, the abstract class AbstractRDFParser > >>> > > https://commons.apache.org/proper/commons-rdf/apidocs/ > >>> > > org/apache/commons/rdf/simple/experimental/AbstractRDFParser.html > >>> > > https://github.com/apache/commons-rdf/blob/master/ > >>> > > simple/src/main/java/org/apache/commons/rdf/simple/experimental/ > >>> > > AbstractRDFParser.java > >>> > > is both, and thus every "option setter" method therefore return a > >>> cloned > >>> > > copy of the factory with the mutated fields set there. This simply > >>> uses > >>> > > .clone() at the moment. > >>> > > > >>> > > This abstract class and immutability pattern is used by each of the > >>> > > three RDF implementations of the interface. > >>> > > > >>> > > > >>> > > This means it's safe to parse many files like this: > >>> > > > >>> > > JenaRDFParser parser = new JenaRDFParser() > >>> > > .contentType(RDFSyntax.TURTLE) > >>> > > .target(g1); > >>> > > > >>> > > // Parse multiple files in parallell > >>> > > Future future1 = parser.source(Paths.get("/tmp/ > >>> graph1.ttl")).parse(); > >>> > > Future future2 = parser.source(Paths.get("/tmp/ > >>> graph2.ttl")).parse(); > >>> > > > >>> > > // Wait for both to finish > >>> > > future1.get(30, TimeUnit.Seconds); > >>> > > future2.get(30, TimeUnit.Seconds); > >>> > > > >>> > > > >>> > > Above the second call to parser.source(..) is not a problem (even > if > >>> > > it happens in another thread) as both .source() calls return > separate > >>> > > cloned JenaRDFParser instances and don't mutate the "parser" > instance. > >>> > > > >>> > > > >>> > > Now a valid critique of such a pattern is that it's wasteful to > make > >>> > > clones that are immediately thrown away, for instance the above > >>> > > future1/future2 example would cause 7 instantiations of > >>> JenaRDFParser() > >>> > > (.parse() does some finalization in a final clone). > >>> > > > >>> > > I have not done any benchmarking, but envision that as parsing RDF > >>> files > >>> > > generally creates many thousands of RDFTerm instances, a couple of > >>> > > extra parser instantiations shouldn't make a big dent in the bill. > >>> > > > >>> > > > >>> > > The typical builder pattern is rather to return "this" and mutate > >>> fields > >>> > > directly. This is obviously not thread-safe but generally achives > the > >>> > > same. Multiple threads would however instead have to create a brand > >>> new > >>> > > parser from start and set all the options themselves. (Or they > could > >>> a > >>> > > similar .clone() method from a carefully locked down "proto" > instance > >>> > > that everyone has to be careful not to call any mutating methods > on) > >>> > > > >>> > > > >>> > > Now multiple threads is now very common because of Java 8 streams, > and > >>> > > for instance it would not be unthinkable that a parallelStream() is > >>> > > mapped as a .source() or .target() on such a parser builder. The > >>> Futures > >>> > > fit somewhat into this picture as well. I think this would get > >>> > > complicated with having to set everything again rather than just > pass > >>> > > "parser::source" as a method reference. > >>> > > > >>> > > > >>> > > > >>> > > > >>> > > As this was just thrown together as an experiment (and we've not > yet > >>> > > tried writing the corresponding RDFWriter interface), I think we > could > >>> > > revisit how to do this pattern, hopefully drawing on the wider > >>> > > experience of the Apache Commons developers on this list. > >>> > > > >>> > > > >>> > > > >>> > > Roughly here are my requirements: > >>> > > > >>> > > * Ability to set various options (e.g. path of file to parse, > format, > >>> > > destination) in any order > >>> > > * Avoid mega-constructors/multi-arg methods > >>> > > * Some options take multiple types, e.g a source can be given as a > >>> Path, > >>> > > InputStream or IRI (URL) -- but would override each other > >>> > > > >>> > > Nice to have: > >>> > > > >>> > > * Reusable (e.g. set options that are common, but modify > >>> > > source() to parse multiple files) > >>> > > * Thread-safe (e.g. immediate reuse of builder before a parsing > >>> session > >>> > > has started or finished) > >>> > > * Easy to serialize / keep around (should not keep references to > big > >>> > > objects in memory uneccessarily) > >>> > > * Parsing in the background, e.g. return Future. (implementation > >>> manages > >>> > > threads/queues as it pleases) > >>> > > > >>> > > Unclear: > >>> > > > >>> > > * How to select between the multiple RDFParser instances? > >>> > > I was thinking the RDF interface could have a .makeParser() > method, > >>> > > but not all parsers can take all formats, so possibly some kind > of > >>> > > registry or "what can you handle" mechanism might be needed. > >>> > > > >>> > > > >>> > > How do you think we should proceed? Mutable or immutable? How > should > >>> the > >>> > > settings be kept? Fields, map, or what? Does it make sense with an > >>> > > interface, abstract class (keeps settings) and implementations > >>> > > (processess settings), or should we have a single ParserFactory > class > >>> > > and have a new internal interface below? > >>> > > > >>> > > > >>> > > -- > >>> > > Stian Soiland-Reyes > >>> > > University of Manchester > >>> > > http://www.esciencelab.org.uk/ > >>> > > http://orcid.org/0000-0001-9842-9718 > >>> > > > >>> > > > >>> > > ------------------------------------------------------------ > --------- > >>> > > To unsubscribe, e-mail: [email protected] > >>> > > For additional commands, e-mail: [email protected] > >>> > > > >>> > > > >>> > > >>> > > >>> > -- > >>> > Matt Sicker <[email protected]> > >>> > > >>> > >>> > >>> > >>> -- > >>> E-Mail: [email protected] | [email protected] > >>> Java Persistence with Hibernate, Second Edition > >>> <https://www.amazon.com/gp/product/1617290459/ref=as_li_tl? > >>> ie=UTF8&camp=1789&creative=9325&creativeASIN=1617290459&link > >>> Code=as2&tag=garygregory-20&linkId=cadb800f39946ec62ea2b1af9fe6a2b8> > >>> > >>> <http:////ir-na.amazon-adsystem.com/e/ir?t=garygregory-20&l= > >>> am2&o=1&a=1617290459> > >>> JUnit in Action, Second Edition > >>> <https://www.amazon.com/gp/product/1935182021/ref=as_li_tl? > >>> ie=UTF8&camp=1789&creative=9325&creativeASIN=1935182021&link > >>> Code=as2&tag=garygregory-20&linkId=31ecd1f6b6d1eaf8886ac902a24de4 > 18%22> > >>> > >>> <http:////ir-na.amazon-adsystem.com/e/ir?t=garygregory-20&l= > >>> am2&o=1&a=1935182021> > >>> Spring Batch in Action > >>> <https://www.amazon.com/gp/product/1935182951/ref=as_li_tl? > >>> ie=UTF8&camp=1789&creative=9325&creativeASIN=1935182951&link > >>> Code=%7B%7BlinkCode%7D%7D&tag=garygregory-20&linkId=%7B%7Bli > >>> nk_id%7D%7D%22%3ESpring+Batch+in+Action> > >>> <http:////ir-na.amazon-adsystem.com/e/ir?t=garygregory-20&l= > >>> am2&o=1&a=1935182951> > >>> Blog: http://garygregory.wordpress.com > >>> Home: http://garygregory.com/ > >>> Tweet! http://twitter.com/GaryGregory > >>> > >> > >> > >> > >> -- > >> Matt Sicker <[email protected]> > >> > > > > > > > > -- > > Matt Sicker <[email protected]> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > > -- Matt Sicker <[email protected]>
