On Jan 4, 2014, at 12:52 AM, Peter Firmstone <j...@zeus.net.au> wrote:
> On 4/01/2014 3:18 PM, Greg Trasuk wrote: >> I’ll also point out Patricia’s recent statement that TaskManager should be >> reasonably efficient for small task queues, but less efficient for larger >> task queues. We don’t have solid evidence that the task queues ever get >> large. Hence, the assertion that “TaskManager doesn’t scale” is meaningless. > > No, it's not about scalability, it's about the window of time when a task is > removed from the queue in TaskManager for execution but fails and needs to be > retried later. Task.runAfter doesn't contain the task that "should have > executed" so dependant tasks proceed before their depenencies. > > This code comment from ServiceDiscoveryManager might help: > > /** This task class, when executed, first registers to receive > * ServiceEvents from the given ServiceRegistrar. If the registration > * process succeeds (no RemoteExceptions), it then executes the > * LookupTask to query the given ServiceRegistrar for a "snapshot" > * of its current state with respect to services that match the > * given template. > * > * Note that the order of execution of the two tasks is important. > * That is, the LookupTask must be executed only after registration > * for events has completed. This is because when an entity registers > * with the event mechanism of a ServiceRegistrar, the entity will > * only receive notification of events that occur "in the future", > * after the registration is made. The entity will not receive events > * about changes to the state of the ServiceRegistrar that may have > * occurred before or during the registration process. > * > * Thus, if the order of these tasks were reversed and the LookupTask > * were to be executed prior to the RegisterListenerTask, then the > * possibility exists for the occurrence of a change in the > * ServiceRegistrar's state between the time the LookupTask retrieves > * a snapshot of that state, and the time the event registration > * process has completed, resulting in an incorrect view of the > * current state of the ServiceRegistrar. > When do you claim that this happens? And what currently happens now that is unacceptable? What is the concrete, observable problem that you’re trying to solve, that justifies introducing failures that require further work? >> If real usage never requires a large task queue, then scalability isn’t an >> issue, and we don’t know whether it ever needs a large task queue. >> >> In any case, removing TaskManager and replacing it with hard-coded >> ThreadPoolExecutors moves us farther away from having the capability of a >> shared work queue. >> So I’m not in favour of this change. I haven’t looked at the other >> services or utility classes, but if the changes are similar, I’m also not in >> favour. > > No TaskManager instances have been replaced by ExecutorService, which is set > via configuration. The hard coded part is how to order tasks through the > configuration provided ExecutorService. > In the 2.2 branch implementation of com.sun.jini.reggie.RegistrarImpl there’s a variable called ‘tasker’ that is not present in the qa_refactor branch’s implementation. > One option is to stop worrying about event order at the sender, and figure > out a way of ordering at the recepient. > > I don’t understand what you’re saying here. > >> You’re introducing changes that introduce test failures (which is why >> you’re asking for help) without a good reason. > > The reason is to expose synchronization bugs so they can be observed and > fixed. Randomly changing code is not a rational diagnostic technique. > >> You’re never going to ship this code unless you stop modifying it. > > It is a considerable undertaking, but I'm not in any hurry, it's not yet > ready for release. I understand you're concerned about the considerable > number of changes; there will be plenty of time for compatibility testing. > >> Also, when you say below, >>> I'm developing an ExecutorService wrapper that retry's failed tasks in >>> org.apache.river.impl.thread.SerialExecutorService, by not removing a task >>> from it's queue until it completes successfully, it prevents any dependant >>> tasks from running, I would like to use this as a replacement for >>> TaskManager and RetryTask. >> …be careful! You’re getting into the same difficult area as transactional >> semantics around messaging. Will you need to provide a “dead task” queue? >> Do you need to set a limit on how many times a task get retried? What >> happens when that limit is exceeded? Do all tasks have the same limit? >> Should a task get notified when it’s exceeded the retry limit? How long >> should you wait between retries? Is that number the same for all tasks. Is >> there some kind of alarm or notification when tasks end up being retried, or >> when the dead task queue becomes full? > > Since most dependencies appear to be based on who the reciepient is, if that > recepient is not contactable in spite of considerable effort, we should > abandon futher attempts to do so. At present RetryTask will continue to > attempt to make contact every 5 minutes. > I don’t understand what you’re saying here. > Regards, > > Peter. > >> Sometimes it’s best not to try to abstract-away all complexity. >> >> Greg. >> >> On Jan 3, 2014, at 10:43 PM, Peter Firmstone<j...@zeus.net.au> wrote: >> >>> ServiceDiscoveryManager is now the only class that utilises TaskManager and >>> RetryTask. JoinManager still uses TaskManager but not RetryTask. See >>> River-344 for an explanation of the problem. >>> >>> Most instances of TaskManager in qa-refactor have been replaced with >>> ExecutorService, RetryTask now implements RunnableFuture and can be >>> cancelled by Future.cancel from the ExecutorService. >>> >>> I'm developing an ExecutorService wrapper that retry's failed tasks in >>> org.apache.river.impl.thread.SerialExecutorService, by not removing a task >>> from it's queue until it completes successfully, it prevents any dependant >>> tasks from running, I would like to use this as a replacement for >>> TaskManager and RetryTask. >>> >>> Can anyone spare time to review, suggest alternatives, or improvements? >>> >>> Thanks in advance, >>> >>> Peter. >>> >>> Failed >>> com_sun_jini_test_impl_servicediscovery_event_DiscardDownReDiscover.td >>> Test Failed: com.sun.jini.qa.harness.TestException: discovery failed -- >>> waited 30 seconds (0 minutes) -- 2 discovery event(s) expected, 1 discovery >>> event(s) received >>> >>> Failed >>> com_sun_jini_test_impl_servicediscovery_event_DiscardServiceDown.td >>> Test Failed: com.sun.jini.qa.harness.TestException: discovery failed -- >>> waited 30 seconds (0 minutes) -- 2 discovery event(s) expected, 0 discovery >>> event(s) received >>> >>> Failed >>> com_sun_jini_test_impl_servicediscovery_event_DiscardServiceUp.td >>> Test Failed: com.sun.jini.qa.harness.TestException: discovery failed -- >>> waited 30 seconds (0 minutes) -- 2 discovery event(s) expected, 0 discovery >>> event(s) received >>> >>> Failed com_sun_jini_test_impl_servicediscovery_event_LookupTaskRace.td >>> Test Failed: com.sun.jini.qa.harness.TestException: discovery failed -- >>> waited 30 seconds (0 minutes) -- 2 discovery event(s) expected, 0 discovery >>> event(s) received >>> >>> Failed >>> com_sun_jini_test_impl_servicediscovery_event_ReRegisterBadEquals.td >>> Test Failed: com.sun.jini.qa.harness.TestException: discovery failed -- >>> waited 30 seconds (0 minutes) -- 4 discovery event(s) expected, 0 discovery >>> event(s) received >>> >>> Failed com_sun_jini_test_impl_servicediscovery_event_ReRegisterGoodEquals.td >>> Test Failed: com.sun.jini.qa.harness.TestException: discovery failed -- >>> waited 30 seconds (0 minutes) -- 4 discovery event(s) expected, 0 discovery >>> event(s) received >>> >>> Failed >>> com_sun_jini_test_impl_servicediscovery_event_ServiceDiscardCacheTerminate.td >>> Test Failed: com.sun.jini.qa.harness.TestException: discovery failed -- >>> waited 30 seconds (0 minutes) -- 4 discovery event(s) expected, 0 discovery >>> event(s) received >>> >>> Failed com_sun_jini_test_spec_servicediscovery_cache_CacheDiscard.td >>> Test Failed: com.sun.jini.qa.harness.TestException: discovery failed -- >>> waited 30 seconds (0 minutes) -- 2 discovery event(s) expected, 0 discovery >>> event(s) received >>> >>> Failed com_sun_jini_test_spec_servicediscovery_cache_CacheLookup.td >>> Test Failed: com.sun.jini.qa.harness.TestException: discovery failed -- >>> waited 30 seconds (0 minutes) -- 2 discovery event(s) expected, 0 discovery >>> event(s) received >>> >>> Failed com_sun_jini_test_spec_servicediscovery_lookup_Lookup.td >>> Test Failed: com.sun.jini.qa.harness.TestException: discovery failed -- >>> waited 30 seconds (0 minutes) -- 2 discovery event(s) expected, 0 discovery >>> event(s) received >>> >>> Failed com_sun_jini_test_spec_servicediscovery_lookup_LookupMax.td >>> Test Failed: com.sun.jini.qa.harness.TestException: discovery failed -- >>> waited 30 seconds (0 minutes) -- 3 discovery event(s) expected, 2 discovery >>> event(s) received >>> >>> Failed >>> com_sun_jini_test_spec_servicediscovery_lookup_LookupMaxFilter.td >>> Test Failed: com.sun.jini.qa.harness.TestException: discovery failed -- >>> waited 30 seconds (0 minutes) -- 3 discovery event(s) expected, 0 discovery >>> event(s) received >>> >>> Failed >>> com_sun_jini_test_spec_servicediscovery_lookup_LookupMinEqualsMax.td >>> Test Failed: com.sun.jini.qa.harness.TestException: discovery failed -- >>> waited 30 seconds (0 minutes) -- 3 discovery event(s) expected, 0 discovery >>> event(s) received >>> >>> Failed >>> com_sun_jini_test_spec_servicediscovery_lookup_LookupMinMaxNoBlockFilter.td >>> Test Failed: com.sun.jini.qa.harness.TestException: discovery failed -- >>> waited 30 seconds (0 minutes) -- 3 discovery event(s) expected, 0 discovery >>> event(s) received >>> >>> Failed com_sun_jini_test_spec_servicediscovery_lookup_LookupWait.td >>> Test Failed: com.sun.jini.qa.harness.TestException: discovery failed -- >>> waited 30 seconds (0 minutes) -- 2 discovery event(s) expected, 0 discovery >>> event(s) received >>> >>> Failed >>> com_sun_jini_test_spec_servicediscovery_lookup_LookupWaitFilter.td >>> Test Failed: com.sun.jini.qa.harness.TestException: discovery failed -- >>> waited 30 seconds (0 minutes) -- 2 discovery event(s) expected, 1 discovery >>> event(s) received >>> >>> Failed >>> com_sun_jini_test_spec_servicediscovery_lookup_LookupWaitNoBlock.td >>> Test Failed: com.sun.jini.qa.harness.TestException: discovery failed -- >>> waited 30 seconds (0 minutes) -- 2 discovery event(s) expected, 0 discovery >>> event(s) received >>> >>> >>> >>> On 4/01/2014 10:27 AM, Apache Jenkins Server wrote: >>>> See<https://builds.apache.org/job/river-qa-refactor-win/45/> >>>> >>>> ------------------------------------------ >>>> [...truncated 15733 lines...] >>>> [java] Test Passed: OK >>>> [java] >>>> [java] ----------------------------------------- >>>> [java] >>>> com/sun/jini/test/spec/javaspace/conformance/snapshot/SnapshotTransactionTakeIfExistsTest.td >>>> [java] Test Passed: OK >>>> [java] >>>> [java] ----------------------------------------- >>>> [java] >>>> com/sun/jini/test/spec/javaspace/conformance/snapshot/SnapshotTransactionTakeIfExistsWaitTest.td >>>> [java] Test Passed: OK >>>> [java] >>>> [java] ----------------------------------------- >>>> [java] >>>> com/sun/jini/test/spec/javaspace/conformance/snapshot/SnapshotTransactionTakeNO_WAITTest.td >>>> [java] Test Passed: OK >>>> [java] >>>> [java] ----------------------------------------- >>>> [java] >>>> com/sun/jini/test/spec/javaspace/conformance/snapshot/SnapshotTransactionTakeReadTest.td >>>> [java] Test Passed: OK >>>> [java] >>>> [java] ----------------------------------------- >>>> [java] >>>> com/sun/jini/test/spec/javaspace/conformance/snapshot/SnapshotTransactionTakeTest.td >>>> [java] Test Passed: OK >>>> [java] >>>> [java] ----------------------------------------- >>>> [java] >>>> com/sun/jini/test/spec/javaspace/conformance/snapshot/SnapshotTransactionTakeWaitTest.td >>>> [java] Test Passed: OK >>>> [java] >>>> [java] ----------------------------------------- >>>> [java] >>>> com/sun/jini/test/spec/javaspace/conformance/snapshot/SnapshotTransactionWriteLeaseANYTest.td >>>> [java] Test Passed: OK >>>> [java] >>>> [java] ----------------------------------------- >>>> [java] >>>> com/sun/jini/test/spec/javaspace/conformance/snapshot/SnapshotTransactionWriteLeaseFOREVERTest.td >>>> [java] Test Passed: OK >>>> [java] >>>> [java] ----------------------------------------- >>>> [java] >>>> com/sun/jini/test/spec/javaspace/conformance/snapshot/SnapshotTransactionWriteNegativeLeaseTest.td >>>> [java] Test Passed: OK >>>> [java] >>>> [java] ----------------------------------------- >>>> [java] >>>> com/sun/jini/test/spec/javaspace/conformance/snapshot/SnapshotTransactionWriteTakeIfExistsNotifyTest.td >>>> [java] Test Passed: OK >>>> [java] >>>> [java] ----------------------------------------- >>>> [java] >>>> com/sun/jini/test/spec/javaspace/conformance/snapshot/SnapshotTransactionWriteTakeIfExistsTest.td >>>> [java] Test Passed: OK >>>> [java] >>>> [java] ----------------------------------------- >>>> [java] >>>> com/sun/jini/test/spec/javaspace/conformance/snapshot/SnapshotTransactionWriteTakeNotifyTest.td >>>> [java] Test Passed: OK >>>> [java] >>>> [java] ----------------------------------------- >>>> [java] >>>> com/sun/jini/test/spec/javaspace/conformance/snapshot/SnapshotTransactionWriteTakeTest.td >>>> [java] Test Passed: OK >>>> [java] >>>> [java] ----------------------------------------- >>>> [java] >>>> com/sun/jini/test/spec/javaspace/conformance/snapshot/SnapshotTransactionWriteTest.td >>>> [java] Test Passed: OK >>>> [java] >>>> [java] ----------------------------------------- >>>> [java] >>>> com/sun/jini/test/spec/javaspace/conformance/snapshot/SnapshotWriteLeaseANYTest.td >>>> [java] Test Passed: OK >>>> [java] >>>> [java] ----------------------------------------- >>>> [java] >>>> com/sun/jini/test/spec/javaspace/conformance/snapshot/SnapshotWriteLeaseFOREVERTest.td >>>> [java] Test Passed: OK >>>> [java] >>>> [java] ----------------------------------------- >>>> [java] >>>> com/sun/jini/test/spec/javaspace/conformance/snapshot/SnapshotWriteNegativeLeaseTest.td >>>> [java] Test Passed: OK >>>> [java] >>>> [java] ----------------------------------------- >>>> [java] >>>> com/sun/jini/test/spec/javaspace/conformance/snapshot/SnapshotWriteTest.td >>>> [java] Test Passed: OK >>>> [java] >>>> [java] ----------------------------------------- >>>> [java] com/sun/jini/test/impl/mahalo/AdminIFShutdownTest.td >>>> [java] Test Passed: OK >>>> [java] >>>> [java] ----------------------------------------- >>>> [java] com/sun/jini/test/impl/mahalo/AdminIFTest.td >>>> [java] Test Passed: OK >>>> [java] >>>> [java] ----------------------------------------- >>>> [java] com/sun/jini/test/impl/mahalo/LeaseExpireCancelTest.td >>>> [java] Test Passed: OK >>>> [java] >>>> [java] ----------------------------------------- >>>> [java] com/sun/jini/test/impl/mahalo/LeaseExpireRenewTest.td >>>> [java] Test Passed: OK >>>> [java] >>>> [java] ----------------------------------------- >>>> [java] com/sun/jini/test/impl/mahalo/LeaseMapTest.td >>>> [java] Test Passed: OK >>>> [java] >>>> [java] ----------------------------------------- >>>> [java] com/sun/jini/test/impl/mahalo/LeaseTest.td >>>> [java] Test Passed: OK >>>> [java] >>>> [java] ----------------------------------------- >>>> [java] com/sun/jini/test/impl/mahalo/MahaloCreateShutdownTest.td >>>> [java] Test Passed: OK >>>> [java] >>>> [java] ----------------------------------------- >>>> [java] com/sun/jini/test/impl/mahalo/MahaloIFTest.td >>>> [java] Test Passed: OK >>>> [java] >>>> [java] ----------------------------------------- >>>> [java] com/sun/jini/test/impl/mahalo/MahaloImplReadyStateTest.td >>>> [java] Test Skipped: verifiers are: >>>> com.sun.jini.test.impl.mercury.ActivatableMercuryVerifier >>>> com.sun.jini.qa.harness.SkipConfigTestVerifier >>>> [java] ----------------------------------------- >>>> [java] >>>> com/sun/jini/test/impl/mahalo/NestableServerTransactionCreatedToStringTest.td >>>> [java] Test Passed: OK >>>> [java] >>>> [java] ----------------------------------------- >>>> [java] >>>> com/sun/jini/test/impl/mahalo/NestableTransactionCreatedToStringTest.td >>>> [java] Test Passed: OK >>>> [java] >>>> [java] ----------------------------------------- >>>> [java] com/sun/jini/test/impl/mahalo/PrepareAndCommitExceptionTest.td >>>> [java] Test Passed: OK >>>> [java] >>>> [java] ----------------------------------------- >>>> [java] com/sun/jini/test/impl/mahalo/PrepareAndCommitExceptionTest2.td >>>> [java] Test Passed: OK >>>> [java] >>>> [java] ----------------------------------------- >>>> [java] com/sun/jini/test/impl/mahalo/PrepareAndCommitExceptionTest3.td >>>> [java] Test Passed: OK >>>> [java] >>>> [java] ----------------------------------------- >>>> [java] com/sun/jini/test/impl/mahalo/PrepareAndCommitExceptionTest4.td >>>> [java] Test Passed: OK >>>> [java] >>>> [java] ----------------------------------------- >>>> [java] com/sun/jini/test/impl/mahalo/PrepareAndCommitExceptionTest5.td >>>> [java] Test Passed: OK >>>> [java] >>>> [java] ----------------------------------------- >>>> [java] com/sun/jini/test/impl/mahalo/RandomStressTest.td >>>> [java] Test Passed: OK >>>> [java] >>>> [java] ----------------------------------------- >>>> [java] com/sun/jini/test/impl/mahalo/ServerTransactionEqualityTest.td >>>> [java] Test Passed: OK >>>> [java] >>>> [java] ----------------------------------------- >>>> [java] com/sun/jini/test/impl/mahalo/ServerTransactionToStringTest.td >>>> [java] Test Passed: OK >>>> [java] >>>> [java] ----------------------------------------- >>>> [java] com/sun/jini/test/impl/mahalo/TransactionCreatedToStringTest.td >>>> [java] Test Passed: OK >>>> [java] >>>> [java] ----------------------------------------- >>>> [java] >>>> com/sun/jini/test/impl/mahalo/TransactionManagerCreatedToStringTest.td >>>> [java] Test Passed: OK >>>> [java] >>>> [java] ----------------------------------------- >>>> [java] >>>> com/sun/jini/test/impl/mahalo/TxnMgrImplNullActivationConfigEntries.td >>>> [java] Test Skipped: verifiers are: >>>> com.sun.jini.test.impl.mahalo.ActivatableMahaloVerifier >>>> [java] ----------------------------------------- >>>> [java] com/sun/jini/test/impl/mahalo/TxnMgrImplNullConfigEntries.td >>>> [java] Test Passed: OK >>>> [java] >>>> [java] ----------------------------------------- >>>> [java] >>>> com/sun/jini/test/impl/mahalo/TxnMgrImplNullRecoveredLocators.td >>>> [java] Test Skipped: verifiers are: >>>> com.sun.jini.test.impl.mahalo.ActivatableMahaloVerifier >>>> [java] ----------------------------------------- >>>> [java] com/sun/jini/test/impl/mahalo/TxnMgrProxyEqualityTest.td >>>> [java] Test Passed: OK >>>> [java] >>>> [java] ----------------------------------------- >>>> [java] com/sun/jini/test/spec/txnmanager/AsynchAbortOnCommitTest.td >>>> [java] Test Passed: OK >>>> [java] >>>> [java] ----------------------------------------- >>>> [java] com/sun/jini/test/spec/txnmanager/AsynchAbortOnPrepareTest.td >>>> [java] Test Passed: OK >>>> [java] >>>> [java] ----------------------------------------- >>>> [java] com/sun/jini/test/spec/txnmanager/CommitExpiredTest.td >>>> [java] Test Passed: OK >>>> [java] >>>> [java] ----------------------------------------- >>>> [java] com/sun/jini/test/spec/txnmanager/CommitTimeoutTest.td >>>> [java] Test Passed: OK >>>> [java] >>>> [java] ----------------------------------------- >>>> [java] com/sun/jini/test/spec/txnmanager/GetStateTest.td >>>> [java] Test Passed: OK >>>> [java] >>>> [java] ----------------------------------------- >>>> [java] com/sun/jini/test/spec/txnmanager/JoinIdempotentTest.td >>>> [java] Test Passed: OK >>>> [java] >>>> [java] ----------------------------------------- >>>> [java] com/sun/jini/test/spec/txnmanager/JoinWhileActiveTest.td >>>> [java] Test Passed: OK >>>> [java] >>>> [java] ----------------------------------------- >>>> [java] com/sun/jini/test/spec/txnmanager/ManyParticipantsTest.td >>>> [java] Test Passed: OK >>>> [java] >>>> [java] ----------------------------------------- >>>> [java] com/sun/jini/test/spec/txnmanager/PrepareTimeoutTest.td >>>> [java] Test Passed: OK >>>> [java] >>>> [java] ----------------------------------------- >>>> [java] com/sun/jini/test/spec/txnmanager/RollBackErrorTest.td >>>> [java] Test Passed: OK >>>> [java] >>>> [java] ----------------------------------------- >>>> [java] com/sun/jini/test/spec/txnmanager/RollForwardErrorTest.td >>>> [java] Test Passed: OK >>>> [java] >>>> [java] ----------------------------------------- >>>> [java] com/sun/jini/test/spec/txnmanager/TwoPhaseTest.td >>>> [java] Test Passed: OK >>>> [java] >>>> [java] ----------------------------------------- >>>> [java] >>>> [java] # of tests started = 1406 >>>> [java] # of tests completed = 1406 >>>> [java] # of tests skipped = 52 >>>> [java] # of tests passed = 1388 >>>> [java] # of tests failed = 18 >>>> [java] >>>> [java] ----------------------------------------- >>>> [java] >>>> [java] Date finished: >>>> [java] Fri Jan 03 16:27:03 PST 2014 >>>> [java] Time elapsed: >>>> [java] 59325 seconds >>>> [java] >>>> [java] Java Result: 1 >>>> >>>> collect-result: >>>> [copy] Copying 1 file >>>> to<https://builds.apache.org/job/river-qa-refactor-win/ws/trunk\qa\result> >>>> [copy] Copying 1 file >>>> to<https://builds.apache.org/job/river-qa-refactor-win/ws/trunk\qa\result> >>>> [zip] Building >>>> zip:<https://builds.apache.org/job/river-qa-refactor-win/ws/trunk\qa\result\qaresults-amd64-Windows> >>>> Server 2008 R2-1.7.0.zip >>>> >>>> BUILD FAILED >>>> <https://builds.apache.org/job/river-qa-refactor-win/ws/trunk\build.xml>:2109: >>>> The following error occurred while executing this line: >>>> <https://builds.apache.org/job/river-qa-refactor-win/ws/trunk\qa\build.xml>:406: >>>> The following error occurred while executing this line: >>>> <https://builds.apache.org/job/river-qa-refactor-win/ws/trunk\qa\build.xml>:380: >>>> condition satisfied >>>> >>>> Total time: 996 minutes 9 seconds >>>> Build step 'Invoke Ant' marked build as failure >>>> Archiving artifacts >