In the example above there is no point of setting the retry future in the exception because it will be serialized and sent to a remote node.
I see the only possible way to ensure this: make this be verified at compile time. This looks like a big change, but the draft may look like so: 1) Introduce new exception type, like CacheTopology*Checked*Exception which *must* take the minimum topology version to wait for 2) In all cases when a cache operation fails because of topology change, provide the appropriate exception 3) When CacheTopologyException is being constructed, create a corresponding local future based on wait version and throw the exception. Even though this still does not give us 100% guarantee that we did not miss anything, it should cover a lot more cases than now. 2017-01-30 14:20 GMT+03:00 Александр Меньшиков <sharple...@gmail.com>: > Alexey, > > For GridCacheIoManager#sendNoRetry it's not necessary because exception > will be ignored (or will cast to String). Perhaps my message was unclear. > I try to say that three methods can throw "ClusterTopologyCheckedException" > without any problem. > > But you are right, in some methods I can't set "retryFuture". We must > ensure that exception doesn't escape away. The best option is to ignore it > (or cast to String). > > But any way, look here: > > IgniteTxHandler#sendReply(UUID nodeId, GridDhtTxFinishRequest req, boolean > committed, GridCacheVersion nearTxId) > > Inside you can found next lines: > __________________________ > ClusterTopologyCheckedException *cause* = > new ClusterTopologyCheckedException("Primary node left grid."); > > res.checkCommittedError(new IgniteTxRollbackCheckedException("Failed to > commit transaction " + > "(transaction has been rolled back on backup node): " + req.version(), > *cause*)); > __________________________ > > How do you unsure that *"cause"* can't come to > GridCacheUtils#retryTopologySafe(final > Callable<S> c) ? > > > Perhaps I don't know some rules which you use to maintain it now? > > > 2017-01-30 12:24 GMT+03:00 Alexey Goncharuk <alexey.goncha...@gmail.com>: > >> Alexander, >> >> Do you have a use-case in mind which results in IgniteTopologyException >> thrown from public API with retryFuture unset? >> >> retryFuture makes sense only for certain cache operations and may be set >> only in certain places in the code where we already know what to wait for. >> I do not see how you can initialize this future, for example, in >> GridCacheIoManager#sendNoRetry(ClusterNode node, GridCacheMessage msg, >> byte >> plc) >> >> --AG >> >> 2017-01-27 15:45 GMT+03:00 Александр Меньшиков <sharple...@gmail.com>: >> >> > I found a lot of using "ClusterTopologyCheckedException" exception. In >> > most >> > use-case these exceptions were just ignored. It's easier to see if >> > issue-4612 will be applied (I'm waiting for code review by my team >> leader) >> > where I renamed all unused exceptions in the "ignored". >> > >> > It means that in some case "retryReadyFuture" can left unset. But there >> are >> > code where exception is being getting from "get()" method in some >> "future" >> > class and don't ignored (exception is sending to method >> > "GridFutureAdapter#onDone()" for example). >> > >> > So I decided not to do a deep global analysis of code and make some >> simple >> > rules. >> > 1. If some method "X" throws ClusterTopologyCheckedException and >> ignores >> > it >> > (or converts to String, there are some variants to lose exception like >> > object) in all methods that call the method "X", then we can don't set >> > "retryReadyFuture" for optimization. But this should be clarified in >> > javadoc. >> > 2. But if exception escapes at least once like object: we must set >> > "retryReadyFuture" in that method. >> > >> > A few times when call tree was simple, I went deeper. >> > >> > So I found only three methods which can throw >> > "ClusterTopologyCheckedException" without setting "retryReadyFuture": >> > >> > 1. IgfsFragmentizerManager#sendWithRetries(UUID nodeId, >> > IgfsCommunicationMessage msg) >> > 2. GridCacheIoManager#sendNoRetry(ClusterNode node, GridCacheMessage >> msg, >> > byte plc) >> > 3. GridCacheIoManager#sendOrderedMessage(ClusterNode node, Object >> topic, >> > GridCacheMessage msg, byte plc, long timeout) >> > >> > In all other methods "ClusterTopologyCheckedException" escapes away too >> > far. >> > >> > What do you think about this approach to make compromise between >> > optimization and correctness? >> > >> > >