On 8/8/13 10:56 AM, Evan Ward wrote: > Gilles wrote: >>> To address a few concerns I saw, >>> >>> Gilles wrote: >>>> In this way, it does not; what I inferred from the partial code >>>> above was >>>> that there would only be _one_ "create" method. But: >>>> 1. All "create" methods must be overridden at the next level of the >>>> class >>>> hierarchy (this is the duplication I was referring to in the first >>>> post). >>> True, but concrete classes should not be extended. >> Why not? >> [Anyways, the duplication also occurs in the intermediate (abstract) >> levels.] >> >>> Adding another >>> abstract class to the hierarchy would mean implementing the create >>> method form the superclass and delegating to a new abstract create >>> method that includes the new parameters. The abstract class hierarchy >>> would have to parallel the interface hierarchy. >> With a mutable instance, you avoid this; that's the point. >> >>>> 2. When a parameter is added somewhere in the hierarchy, all the >>>> classes >>>> below must be touched too. >>> True, but since abstract classes are just skeletal implementations of >>> interfaces you can't add any methods without breaking compatibility >>> anyway. >> Adding a (concrete) method in an abstract class will not break >> compatibility. >> >>> (You would have to add the same method to the public interface >>> too.) >> There you break compatibility; that's why we removed a lot of the >> interfaces in CM, because the API is not stable, and abstract classes >> allow non-breaking changes. >> >>> This does make it important to decide on a well written and >>> complete API before releasing it. >> When the scope of the software is well circumscribed, that would be >> possible. With the whole of [Math]ematics, much less so. :-} >> And state-of-the-art in Java is a moving target, aimed at by changing >> CM contributors with differing needs and tastes; this adds to the >> unstable mix. > That's a good point. I still prefer the interface design (though I may > be in the minority) for two reasons. First, if a concrete class only > publicly exposes the methods defined in an interface it encourages > polymorphism. User code that uses one implementation can be easily > switched to another and new implementations are less constrained. > Second, it encourages composition over inheritance. I agree with Josh > Bloch that composition produces more maintainable code. Adding new > methods to an existing interface/class breaks composition. > > I think the interface/abstract class discussion is partially separable > from the immutable/mutable discussion. I see the algorithm as the part > that could really benefit from the polymorphism. Perhaps separating the > problem definition (data) from the algorithm will improve the > flexibility of the API. For example, > > PointVectorValuePair solveMyNLLSProblem(NLLSOptimizer opt){ > //define problem to solve in an independent object > NLLSProblem p = new NLLSProblem(/*model functions, weights, > convergence checker, ...*/); > > //provide algorithm with the data it needs > //algorithm has no problem specific state > return opt.optimize(p); > } > >>>> And, we must note, that the duplication still does not ensure "real" >>>> immutability unless all the passed arguments are themselves immutable. >>>> But: >>>> 1. We do not have control over them; e.g. in the case of the optimizers >>>> the "ConvergenceChecker" interface could possibly be implemented by a >>>> non-thread-safe class (I gave one example of such a thing a few weeks >>>> ago when a user wanted to track the optimizer's search path) >>> True. Thread safety is a tricky beast. I think we agree that the only >>> way to guarantee thread safety is to only depend on final concrete >>> classes that are thread safe themselves. >> I don't think so. When objects are immutable, thread-safety follows > It is somewhat off topic, but a counter example would be Vector3D. Since > the class is not final, a user could extend it and override all the > methods and add some set{X,Y,Z} methods to make it mutable. Even though > Vector3D is immutable, there is no _guarantee_ that every instanceof > Vector3D is immutable unless it is final. This is why String is final. > >> (but >> note that the current optimizers in CM were never thread-safe). >> But thread-safety can exist even with mutable objects; it's just that >> more >> care must be taken to ensure it. >> >>> This is directly at odds with >>> the inversion of control/dependency injection paradigm. I think a >>> reasonable compromise is to depend on interfaces and make sure all the >>> provided implementations are thread safe. >> Yes, that a way, but again easier said that done. >> >>> A simple sequential user won't >>> need to care about thread safety. A concurrent user will need to >>> understand the implications of Java threading to begin with. Accurate >>> documentation of which interfaces and methods are assumed to be thread >>> safe goes a long way here. >> I don't think I'm wrong if I say that most concurrent bugs are found in >> production rather than in contrived test cases. >> [That's why I advocated for introducing "real" applications as use-cases >> for CM.] >> >>>> 2. Some users _complained_ (among other things :-) that we should not >>>> force immutability of some input (model function and Jacobian IIRC) >>>> because in some use-cases, it would be unnecessarily costly. >>> I agree that copying any large matrices or arrays is prohibitively >>> expensive. For the NLLS package we would be copying a pointer to a >>> function that can generate a large matrix. I think adding some >>> documentation that functions should be thread safe if you want to use >>> them from multiple threads would be sufficient. >> I you pass a "pointer" (i.e. a "reference" in Java), all bets are off: >> the >> class is not inherently thread-safe. That's why I suggested to mandate a >> _deep_ "copy" method (with a stringent contract that should allow a >> caller >> to be sure that all objects owned by an instance are disconnected from >> any >> other objects). > As someone who has designed a thread safe application based on deep > copying I don't think this is route to follow. A deep copy means you > have to be able to copy an arbitrary (possibly cyclical) reference > graph. Without the graph copy there are many subtle bugs. (References to > the same object are now references to different objects.) With the graph > copy the implementation is very complex. This is the reason > Serialization has a separate "patch up" step after object creation, > which leads to some nasty tricks/bugs. Similarly, Cloneable only > produces a shallow copy. Opinions may vary, but in my experience > immutability is an easier approach to thread safety, especially when you > have to depend on user code. > > Either way there can be some tricky bugs for the user. In the immutable > approach the bugs are solved by finding non-final fields and references > to objects that are not thread safe. In the deep copy approach it can be > hard to track down which reference isn't being copied correctly. My > experience is that when I used deep copy I found may bugs in production. > Now with immutability it is easier for me to reason about thread safety > and I find few. > >>>> Consequently, if (when creating a new instance) we assign a reference >>>> passed to the fluent method, we cannot warrant thread-safety anymore; >>>> which in turn poses the question of whether this false sense of >>>> security >>>> warrants the increased code duplication and complexity (compare your >>>> code >>>> below with the same but without the constructors and "create" methods: >>>> even a toy example is already cut by more than half). >>> Agreed that thread safety can only be guaranteed with the help of the >>> user. The immutable+fluent combination does add an additional layer of >>> indirection. On the other hand it is much simpler to debug and analyze, >>> especially in a concurrent environment. >> Immutable objects can be shared between threads. >> Thread-safety is equally obtained if mutable objects are not shared >> between >> threads, e.g. keep the optimizer confined in one thread (no >> implementation >> in CM features concurrency anyways). >> Even if the optimizers are not immutable, it will be possible to benefit >> from efficiency improvement brought by concurrency e.g. by ensuring >> that the >> objective function is thread-safe, several evaluations could be >> performed in >> parallel. [Moreover the optimization logic is not really concurrent in >> most >> optimizers. So why have unnecessarily complicated code?] > I agree that the optimization algorithm doesn't need to be concurrent. I > think the key is to be able to use the same optimizer (algorithm) on > different problems in different threads. E.g. > optimizer.optimize(problem) or optimizer.copy().optimize(problem) Both > work and both are better than my solution with the current API (see > below), but I prefer the immutable algorithm. In the end you are the > maintainer. ;) > >>>> Then if we really want to aim at thread-safety, I think that the >>>> approach >>>> of mandating a "copy()" interface to all participating classes, >>>> would be >>>> a serious contender to just making all fields "final". Let's also >>>> recall >>>> that immutability is not a goal but a means (the goal being >>>> thread-safety). >>>> >>>> I still think that the three "tools" mentioned in the subject line >>>> do not >>>> play along very well, unfortunately. >>>> I was, and still am, a proponent of immutability but IMO this >>>> discussion >>>> indicates that it should not be enforced at all cost. In particular, in >>>> small and non-layered objects, it is easy to ensure thread-safety >>>> (through >>>> "final") but the objects that most benefit from a fluent API are big >>>> (with >>>> many configuration combinations), and their primary usage does not >>>> necessarily benefit from transparent instantiation. >>>> >>>> Given all these considerations, in the first steps for moving some CM >>>> codes >>>> to fluent APIs, I would aim for simplicity and _less_ code (if just >>>> to be >>>> able to make adjustments more quickly). >>>> Then, from "simple" code, we can more easily experiment (and compare, >>>> with >>>> "diff") the merits and drawbacks of various routes towards >>>> thread-safety. >>>> >>>> OK? >>> Agreed on the goal of thread safety. Is the copy a shallow copy? >> No! (cf. above.) >> >>> If it >>> is, then copy is a complete punt of thread-safety to the user, forcing >>> them to them to determine when and what they need to copy. I think the >>> mutable+copy option would make it harder to understand client code and >>> understand where copying is necessary. >> No; my suggestion, if feasible, is that a user who needs his own, >> personal, >> unshared instance would simply call "copy()": >> >> public void myMethodOne(Optimizer optim) { >> // I don't trust that "optim" can be safe: I make a (deep) copy. >> final Optimizer myOptim = optim.copy(); >> >> // By contract, "myOptim" is assumed to contain unshared fields. >> // I can pass it safely to another thread. >> myMethodTwo(myOptim); >> } >> >> private void myMethodTwo(Optimizer optim) { >> // Create a new thread, whatever... >> } >> >> But my current point is that this could be developed at a later point, >> after the fluent API has been adopted (and more widely tried within CM), >> when someone has shown that e.g. the optimizer must be thread-safe to >> achieve some actual use-case. > In my use case I have a class that solves several related, but > different, NLLS problems concurrently. I would like the optimization > algorithm to be a configurable dependency (GN or LM). Currently I have a > custom (thread safe) interface with two implementations that wrap the > commons math optimizers, in order to provide thread safety and > polymorphic access to all the options that both optimizers support.
Can you explain this a little more? I am still not getting the use case requiring concurrent access to a single optimizer instance. It appears that you have one. It would be good to understand it better. Phil > > Best Regards, > Evan > >>> Ultimately the decision is up to >>> the maintainers and I think both options under discussion are a big >>> improvement over the current API. :) >>> >>> Thanks for the great library. >> Thanks for the discussion, >> Gilles > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > For additional commands, e-mail: dev-h...@commons.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org