Re: [Math] Fluent API, inheritance and immutability

Phil Steitz Fri, 09 Aug 2013 17:43:35 -0700

On 8/8/13 10:56 AM, Evan Ward wrote:
> Gilles wrote:
>>> To address a few concerns I saw,
>>>
>>> Gilles wrote:
>>>> In this way, it does not; what I inferred from the partial code
>>>> above was
>>>> that there would only be _one_ "create" method. But:
>>>> 1. All "create" methods must be overridden at the next level of the
>>>> class
>>>> hierarchy (this is the duplication I was referring to in the first
>>>> post).
>>> True, but concrete classes should not be extended.
>> Why not?
>> [Anyways, the duplication also occurs in the intermediate (abstract)
>> levels.]
>>
>>> Adding another
>>> abstract class to the hierarchy would mean implementing the create
>>> method form the superclass and delegating to a new abstract create
>>> method that includes the new parameters. The abstract class hierarchy
>>> would have to parallel the interface hierarchy.
>> With a mutable instance, you avoid this; that's the point.
>>
>>>> 2. When a parameter is added somewhere in the hierarchy, all the
>>>> classes
>>>> below must be touched too.
>>> True, but since abstract classes are just skeletal implementations of
>>> interfaces you can't add any methods without breaking compatibility
>>> anyway.
>> Adding a (concrete) method in an abstract class will not break
>> compatibility.
>>
>>> (You would have to add the same method to the public interface
>>> too.)
>> There you break compatibility; that's why we removed a lot of the
>> interfaces in CM, because the API is not stable, and abstract classes
>> allow non-breaking changes.
>>
>>> This does make it important to decide on a well written and
>>> complete API before releasing it.
>> When the scope of the software is well circumscribed, that would be
>> possible. With the whole of [Math]ematics, much less so. :-}
>> And state-of-the-art in Java is a moving target, aimed at by changing
>> CM contributors with differing needs and tastes; this adds to the
>> unstable mix.
> That's a good point. I still prefer the interface design (though I may
> be in the minority) for two reasons. First, if a concrete class only
> publicly exposes the methods defined in an interface it encourages
> polymorphism. User code that uses one implementation can be easily
> switched to another and new implementations are less constrained.
> Second, it encourages composition over inheritance. I agree with Josh
> Bloch that composition produces more maintainable code. Adding new
> methods to an existing interface/class breaks composition.
>
> I think the interface/abstract class discussion is partially separable
> from the immutable/mutable discussion. I see the algorithm as the part
> that could really benefit from the polymorphism. Perhaps separating the
> problem definition (data) from the algorithm will improve the
> flexibility of the API. For example,
>
> PointVectorValuePair solveMyNLLSProblem(NLLSOptimizer opt){
>     //define problem to solve in an independent object
>     NLLSProblem p = new NLLSProblem(/*model functions, weights,
> convergence checker, ...*/);
>
>     //provide algorithm with the data it needs
>     //algorithm has no problem specific state
>     return opt.optimize(p);
> }
>
>>>> And, we must note, that the duplication still does not ensure "real"
>>>> immutability unless all the passed arguments are themselves immutable.
>>>> But:
>>>> 1. We do not have control over them; e.g. in the case of the optimizers
>>>> the "ConvergenceChecker" interface could possibly be implemented by a
>>>> non-thread-safe class (I gave one example of such a thing a few weeks
>>>> ago when a user wanted to track the optimizer's search path)
>>> True. Thread safety is a tricky beast. I think we agree that the only
>>> way to guarantee thread safety is to only depend on final concrete
>>> classes that are thread safe themselves.
>> I don't think so. When objects are immutable, thread-safety follows
> It is somewhat off topic, but a counter example would be Vector3D. Since
> the class is not final, a user could extend it and override all the
> methods and add some set{X,Y,Z} methods to make it mutable. Even though
> Vector3D is immutable, there is no _guarantee_ that every instanceof
> Vector3D is immutable unless it is final. This is why String is final.
>
>> (but
>> note that the current optimizers in CM were never thread-safe).
>> But thread-safety can exist even with mutable objects; it's just that
>> more
>> care must be taken to ensure it.
>>
>>> This is directly at odds with
>>> the inversion of control/dependency injection paradigm. I think a
>>> reasonable compromise is to depend on interfaces and make sure all the
>>> provided implementations are thread safe.
>> Yes, that a way, but again easier said that done.
>>
>>> A simple sequential user won't
>>> need to care about thread safety. A concurrent user will need to
>>> understand the implications of Java threading to begin with. Accurate
>>> documentation of which interfaces and methods are assumed to be thread
>>> safe goes a long way here.
>> I don't think I'm wrong if I say that most concurrent bugs are found in
>> production rather than in contrived test cases.
>> [That's why I advocated for introducing "real" applications as use-cases
>> for CM.]
>>
>>>> 2. Some users _complained_ (among other things :-) that we should not
>>>> force immutability of some input (model function and Jacobian IIRC)
>>>> because in some use-cases, it would be unnecessarily costly.
>>> I agree that copying any large matrices or arrays is prohibitively
>>> expensive. For the NLLS package we would be copying a pointer to a
>>> function that can generate a large matrix. I think adding some
>>> documentation that functions should be thread safe if you want to use
>>> them from multiple threads would be sufficient.
>> I you pass a "pointer" (i.e. a "reference" in Java), all bets are off:
>> the
>> class is not inherently thread-safe. That's why I suggested to mandate a
>> _deep_ "copy" method (with a stringent contract that should allow a
>> caller
>> to be sure that all objects owned by an instance are disconnected from
>> any
>> other objects).
> As someone who has designed a thread safe application based on deep
> copying I don't think this is route to follow. A deep copy means you
> have to be able to copy an arbitrary (possibly cyclical) reference
> graph. Without the graph copy there are many subtle bugs. (References to
> the same object are now references to different objects.) With the graph
> copy the implementation is very complex. This is the reason
> Serialization has a separate "patch up" step after object creation,
> which leads to some nasty tricks/bugs. Similarly, Cloneable only
> produces a shallow copy. Opinions may vary, but in my experience
> immutability is an easier approach to thread safety, especially when you
> have to depend on user code.
>
> Either way there can be some tricky bugs for the user. In the immutable
> approach the bugs are solved by finding non-final fields and references
> to objects that are not thread safe. In the deep copy approach it can be
> hard to track down which reference isn't being copied correctly. My
> experience is that when I used deep copy I found may bugs in production.
> Now with immutability it is easier for me to reason about thread safety
> and I find few.
>
>>>> Consequently, if (when creating a new instance) we assign a reference
>>>> passed to the fluent method, we cannot warrant thread-safety anymore;
>>>> which in turn poses the question of whether this false sense of
>>>> security
>>>> warrants the increased code duplication and complexity (compare your
>>>> code
>>>> below with the same but without the constructors and "create" methods:
>>>> even a toy example is already cut by more than half).
>>>  Agreed that thread safety can only be guaranteed with the help of the
>>> user. The immutable+fluent combination does add an additional layer of
>>> indirection. On the other hand it is much simpler to debug and analyze,
>>> especially in a concurrent environment.
>> Immutable objects can be shared between threads.
>> Thread-safety is equally obtained if mutable objects are not shared
>> between
>> threads, e.g. keep the optimizer confined in one thread (no
>> implementation
>> in CM features concurrency anyways).
>> Even if the optimizers are not immutable, it will be possible to benefit
>> from efficiency improvement brought by concurrency e.g. by ensuring
>> that the
>> objective function is thread-safe, several evaluations could be
>> performed in
>> parallel. [Moreover the optimization logic is not really concurrent in
>> most
>> optimizers. So why have unnecessarily complicated code?]
> I agree that the optimization algorithm doesn't need to be concurrent. I
> think the key is to be able to use the same optimizer (algorithm) on
> different problems in different threads. E.g.
> optimizer.optimize(problem) or optimizer.copy().optimize(problem) Both
> work and both are better than my solution with the current API (see
> below), but I prefer the immutable algorithm. In the end you are the
> maintainer. ;)
>
>>>> Then if we really want to aim at thread-safety, I think that the
>>>> approach
>>>> of mandating a "copy()" interface to all participating classes,
>>>> would be
>>>> a serious contender to just making all fields "final". Let's also
>>>> recall
>>>> that immutability is not a goal but a means (the goal being
>>>> thread-safety).
>>>>
>>>> I still think that the three "tools" mentioned in the subject line
>>>> do not
>>>> play along very well, unfortunately.
>>>> I was, and still am, a proponent of immutability but IMO this
>>>> discussion
>>>> indicates that it should not be enforced at all cost. In particular, in
>>>> small and non-layered objects, it is easy to ensure thread-safety
>>>> (through
>>>> "final") but the objects that most benefit from a fluent API are big
>>>> (with
>>>> many configuration combinations), and their primary usage does not
>>>> necessarily benefit from transparent instantiation.
>>>>
>>>> Given all these considerations, in the first steps for moving some CM
>>>> codes
>>>> to fluent APIs, I would aim for simplicity and _less_ code (if just
>>>> to be
>>>> able to make adjustments more quickly).
>>>> Then, from "simple" code, we can more easily experiment (and compare,
>>>> with
>>>> "diff") the merits and drawbacks of various routes towards
>>>> thread-safety.
>>>>
>>>> OK?
>>> Agreed on the goal of thread safety. Is the copy a shallow copy?
>> No! (cf. above.)
>>
>>> If it
>>> is, then copy is a complete punt of thread-safety to the user, forcing
>>> them to them to determine when and what they need to copy. I think the
>>> mutable+copy option would make it harder to understand client code and
>>> understand where copying is necessary.
>> No; my suggestion, if feasible, is that a user who needs his own,
>> personal,
>> unshared instance would simply call "copy()":
>>
>> public void myMethodOne(Optimizer optim) {
>>   // I don't trust that "optim" can be safe: I make a (deep) copy.
>>   final Optimizer myOptim = optim.copy();
>>
>>   // By contract, "myOptim" is assumed to contain unshared fields.
>>   // I can pass it safely to another thread.
>>   myMethodTwo(myOptim);
>> }
>>
>> private void myMethodTwo(Optimizer optim) {
>>   // Create a new thread, whatever...
>> }
>>
>> But my current point is that this could be developed at a later point,
>> after the fluent API has been adopted (and more widely tried within CM),
>> when someone has shown that e.g. the optimizer must be thread-safe to
>> achieve some actual use-case.
> In my use case I have a class that solves several related, but
> different, NLLS problems concurrently. I would like the optimization
> algorithm to be a configurable dependency (GN or LM). Currently I have a
> custom (thread safe) interface with two implementations that wrap the
> commons math optimizers, in order to provide thread safety and
> polymorphic access to all the options that both optimizers support.


Can you explain this a little more?  I am still not getting the use
case requiring concurrent access to a single optimizer instance.  It
appears that you have one.  It would be good to understand it better.

Phil
>
> Best Regards,
> Evan
>
>>> Ultimately the decision is up to
>>> the maintainers and I think both options under discussion are a big
>>> improvement over the current API. :)
>>>
>>> Thanks for the great library.
>> Thanks for the discussion,
>> Gilles
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> For additional commands, e-mail: dev-h...@commons.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [Math] Fluent API, inheritance and immutability

Reply via email to