Come to think of it, clone() is going to make a shallow copy anyways, so
any RDFParser fields here wouldn't get copied between clones, just
references.

On 9 February 2017 at 03:44, sebb <[email protected]> wrote:

> I've not looked at the code itself yet, but AFAIK clone() is not
> sufficient to ensure thread-safety of a mutable object.
>
> This is a consequence of the Java memory model, which allows items to
> be cached locally.
> Simply put, safe publication of values requires that the reader of a
> field synchronise on the same lock as a the writer that updated it.
> Otherwise the reader may never see the updated value (it may remain in
> the writer's cache).
>
>
> On 9 February 2017 at 06:37, Matt Sicker <[email protected]> wrote:
> > Looking more closely, why not use CompleteableFuture instead of Future
> > since this is Java 8?
> >
> > Also, the gist of the configuration API you're looking for sounds like a
> > way to emulate closures in Java. :)
> >
> > You could also add a freeze method to switch between mutable and
> immutable
> > views of the factory internally to support a sort of deferred parameter
> > setting for thread safety purposes (i.e., cloning mutable objects). For
> > example:
> >
> > MockRDFParser parser = new MockRDFParser().foo(foo).bar(bar).freeze();
> > // at this point, any calls to a mutator will return a mutable clone
> until
> > frozen again
> >
> > parser.path(path).syntax(syntax).sink(sink).parse(); // thread 1
> > parser.path(path).syntax(syntax).sink(sink).parse(); // thread 2
> >
> > Now both parsers should still work. For immutable, threadsafe objects,
> you
> > don't even have to clone them in the thawed clones. The naming could use
> > some work, but I think you get the idea.
> >
> > I'm also kind of thinking the three different source types could be
> unified
> > into their own Source kind of class since the use of any of them are
> > mutually exclusive.
> >
> > On 8 February 2017 at 23:18, Matt Sicker <[email protected]> wrote:
> >
> >> I'm not familiar with the code, but it sounds like you're in the early
> >> stages of a plugin system. The good old BeanFactory.
> >>
> >> Another possible way to go about untying thread safe and not thread safe
> >> parser factory builders would be using naming conventions like setters
> for
> >> mutable and withers for immutable (copy on write) APIs.
> >>
> >> On 8 February 2017 at 22:58, Gary Gregory <[email protected]>
> wrote:
> >>
> >>> On Wed, Feb 8, 2017 at 7:27 PM, Matt Sicker <[email protected]> wrote:
> >>>
> >>> > I've always considered builders to be mutable, thread unsafe objects
> >>> used
> >>> > to create immutable, thread safe objects.
> >>>
> >>>
> >>> +1
> >>>
> >>> Gary
> >>>
> >>>
> >>> > If these builders cause GC
> >>> > pressure to go too high, then I'd turn to object pooling or
> per-thread
> >>> > reusable objects depending on how the code is used.
> >>> >
> >>> > On 8 February 2017 at 20:38, Stian Soiland-Reyes <[email protected]>
> >>> wrote:
> >>> >
> >>> > > Peter Ansell raised a valid question about why in Commons RDF I
> made
> >>> the
> >>> > > RDFParser interface as an immutable factory-like builder pattern.
> >>> > >
> >>> > > https://commons.apache.org/proper/commons-rdf/apidocs/
> >>> > > org/apache/commons/rdf/experimental/RDFParser.html
> >>> > >
> >>> > >
> >>> > > Here is how it can be used today:
> >>> > >
> >>> > >  RDF rdf = new JenaRDF();
> >>> > >  Graph g1 = rdf.createGraph();
> >>> > >  Future future = new JenaRDFParser()
> >>> > >          .source(Paths.get("/tmp/graph.ttl"))
> >>> > >          .contentType(RDFSyntax.TURTLE)
> >>> > >          .target(g1).parse();
> >>> > >  future.get(30, TimeUnit.Seconds); // block
> >>> > >
> >>> > >
> >>> > > You may notice it is not a factory as-such, as you only get a
> Future
> >>> > > of the parser result, rather than the actual "parser". Also it is
> >>> > > currently an interface, with multiple implementations (one per RDF
> >>> > > implementation).
> >>> > >
> >>> > >
> >>> > >
> >>> > > Although the javadoc only suggests for it to be immutable and
> >>> > > threadsafe, the abstract class AbstractRDFParser
> >>> > > https://commons.apache.org/proper/commons-rdf/apidocs/
> >>> > > org/apache/commons/rdf/simple/experimental/AbstractRDFParser.html
> >>> > > https://github.com/apache/commons-rdf/blob/master/
> >>> > > simple/src/main/java/org/apache/commons/rdf/simple/experimental/
> >>> > > AbstractRDFParser.java
> >>> > > is both, and thus every "option setter" method therefore return a
> >>> cloned
> >>> > > copy of the factory with the mutated fields set there. This simply
> >>> uses
> >>> > > .clone() at the moment.
> >>> > >
> >>> > > This abstract class and immutability pattern is used by each of the
> >>> > > three RDF implementations of the interface.
> >>> > >
> >>> > >
> >>> > > This means it's safe to parse many files like this:
> >>> > >
> >>> > >  JenaRDFParser parser = new JenaRDFParser()
> >>> > >          .contentType(RDFSyntax.TURTLE)
> >>> > >          .target(g1);
> >>> > >
> >>> > >  // Parse multiple files in parallell
> >>> > >  Future future1 = parser.source(Paths.get("/tmp/
> >>> graph1.ttl")).parse();
> >>> > >  Future future2 = parser.source(Paths.get("/tmp/
> >>> graph2.ttl")).parse();
> >>> > >
> >>> > >  // Wait for both to finish
> >>> > >  future1.get(30, TimeUnit.Seconds);
> >>> > >  future2.get(30, TimeUnit.Seconds);
> >>> > >
> >>> > >
> >>> > > Above the second call to parser.source(..) is not a problem (even
> if
> >>> > > it happens in another thread) as both .source() calls return
> separate
> >>> > > cloned JenaRDFParser instances and don't mutate the "parser"
> instance.
> >>> > >
> >>> > >
> >>> > > Now a valid critique of such a pattern is that it's wasteful to
> make
> >>> > > clones that are immediately thrown away, for instance the above
> >>> > > future1/future2 example would cause 7 instantiations of
> >>> JenaRDFParser()
> >>> > > (.parse() does some finalization in a final clone).
> >>> > >
> >>> > > I have not done any benchmarking, but envision that as parsing RDF
> >>> files
> >>> > > generally creates many thousands of RDFTerm instances, a couple of
> >>> > > extra parser instantiations shouldn't make a big dent in the bill.
> >>> > >
> >>> > >
> >>> > > The typical builder pattern is rather to return "this" and mutate
> >>> fields
> >>> > > directly. This is obviously not thread-safe but generally achives
> the
> >>> > > same. Multiple threads would however instead have to create a brand
> >>> new
> >>> > > parser from start and set all the options themselves.  (Or they
> could
> >>> a
> >>> > > similar .clone() method from a carefully locked down "proto"
> instance
> >>> > > that everyone has to be careful not to call any mutating methods
> on)
> >>> > >
> >>> > >
> >>> > > Now multiple threads is now very common because of Java 8 streams,
> and
> >>> > > for instance it would not be unthinkable that a parallelStream() is
> >>> > > mapped as a .source() or .target() on such a parser builder. The
> >>> Futures
> >>> > > fit somewhat into this picture as well.   I think this would get
> >>> > > complicated with having to set everything again rather than just
> pass
> >>> > > "parser::source" as a method reference.
> >>> > >
> >>> > >
> >>> > >
> >>> > >
> >>> > > As this was just thrown together as an experiment (and we've not
> yet
> >>> > > tried writing the corresponding RDFWriter interface), I think we
> could
> >>> > > revisit how to do this pattern, hopefully drawing on the wider
> >>> > > experience of the Apache Commons developers on this list.
> >>> > >
> >>> > >
> >>> > >
> >>> > > Roughly here are my requirements:
> >>> > >
> >>> > > * Ability to set various options (e.g. path of file to parse,
> format,
> >>> > >   destination) in any order
> >>> > > * Avoid mega-constructors/multi-arg methods
> >>> > > * Some options take multiple types, e.g a source can be given as a
> >>> Path,
> >>> > >   InputStream or IRI (URL) -- but would override each other
> >>> > >
> >>> > > Nice to have:
> >>> > >
> >>> > > * Reusable (e.g. set options that are common, but modify
> >>> > >   source() to parse multiple files)
> >>> > > * Thread-safe (e.g. immediate reuse of builder before a parsing
> >>> session
> >>> > >   has started or finished)
> >>> > > * Easy to serialize / keep around (should not keep references to
> big
> >>> > >   objects in memory uneccessarily)
> >>> > > * Parsing in the background, e.g. return Future. (implementation
> >>> manages
> >>> > >   threads/queues as it pleases)
> >>> > >
> >>> > > Unclear:
> >>> > >
> >>> > > * How to select between the multiple RDFParser instances?
> >>> > >   I was thinking the RDF interface could have a .makeParser()
> method,
> >>> > >   but not all parsers can take all formats, so possibly some kind
> of
> >>> > >   registry or "what can you handle" mechanism might be needed.
> >>> > >
> >>> > >
> >>> > > How do you think we should proceed? Mutable or immutable? How
> should
> >>> the
> >>> > > settings be kept? Fields, map, or what?  Does it make sense with an
> >>> > > interface, abstract class (keeps settings) and implementations
> >>> > > (processess settings), or should we have a single ParserFactory
> class
> >>> > > and have a new internal interface below?
> >>> > >
> >>> > >
> >>> > > --
> >>> > > Stian Soiland-Reyes
> >>> > > University of Manchester
> >>> > > http://www.esciencelab.org.uk/
> >>> > > http://orcid.org/0000-0001-9842-9718
> >>> > >
> >>> > >
> >>> > > ------------------------------------------------------------
> ---------
> >>> > > To unsubscribe, e-mail: [email protected]
> >>> > > For additional commands, e-mail: [email protected]
> >>> > >
> >>> > >
> >>> >
> >>> >
> >>> > --
> >>> > Matt Sicker <[email protected]>
> >>> >
> >>>
> >>>
> >>>
> >>> --
> >>> E-Mail: [email protected] | [email protected]
> >>> Java Persistence with Hibernate, Second Edition
> >>> <https://www.amazon.com/gp/product/1617290459/ref=as_li_tl?
> >>> ie=UTF8&camp=1789&creative=9325&creativeASIN=1617290459&link
> >>> Code=as2&tag=garygregory-20&linkId=cadb800f39946ec62ea2b1af9fe6a2b8>
> >>>
> >>> <http:////ir-na.amazon-adsystem.com/e/ir?t=garygregory-20&l=
> >>> am2&o=1&a=1617290459>
> >>> JUnit in Action, Second Edition
> >>> <https://www.amazon.com/gp/product/1935182021/ref=as_li_tl?
> >>> ie=UTF8&camp=1789&creative=9325&creativeASIN=1935182021&link
> >>> Code=as2&tag=garygregory-20&linkId=31ecd1f6b6d1eaf8886ac902a24de4
> 18%22>
> >>>
> >>> <http:////ir-na.amazon-adsystem.com/e/ir?t=garygregory-20&l=
> >>> am2&o=1&a=1935182021>
> >>> Spring Batch in Action
> >>> <https://www.amazon.com/gp/product/1935182951/ref=as_li_tl?
> >>> ie=UTF8&camp=1789&creative=9325&creativeASIN=1935182951&link
> >>> Code=%7B%7BlinkCode%7D%7D&tag=garygregory-20&linkId=%7B%7Bli
> >>> nk_id%7D%7D%22%3ESpring+Batch+in+Action>
> >>> <http:////ir-na.amazon-adsystem.com/e/ir?t=garygregory-20&l=
> >>> am2&o=1&a=1935182951>
> >>> Blog: http://garygregory.wordpress.com
> >>> Home: http://garygregory.com/
> >>> Tweet! http://twitter.com/GaryGregory
> >>>
> >>
> >>
> >>
> >> --
> >> Matt Sicker <[email protected]>
> >>
> >
> >
> >
> > --
> > Matt Sicker <[email protected]>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>


-- 
Matt Sicker <[email protected]>

Reply via email to