This is now building successfully, it is pretty slow on Apache Jenkins (11 mins) but it successfully builds and passes all 2,700 or so tests
I will look at getting this added back into the main build now that the issues appear to have been resolved. Rob On 8/9/13 10:31 AM, "Rob Vesse" <[email protected]> wrote: >I just had an ah-ha moment after reading this and looking over your latest >commits. > >The issue is that the code makes a lot of queries, queries create their >own HttpClient instances because otherwise they can't apply timeouts to >remote requests since timeouts are a global parameter setting on a HTTP >client. I am going to try having the query route (HttpQuery) pass the >HttpClient it creates internally up to the QueryEngineHTTP and have that >explicitly shutdown the client when the use closes the query execution. > >QueryEngineHTTP already ensure that they close the TypedInputStream when >the query execution is closed. > >I will test this out and see if it resolves the issue on Jenkins. > >Rob > > >On 8/9/13 6:29 AM, "Andy Seaborne" <[email protected]> wrote: > >>The code now in SVN is showing stability for heavy repeated use and for >>the authentication tests cases. However, the stability test has always >>failed non-deterministically so it's not proof it's all working. I have >>gone through and tracked down response handling and I hope I have >>ensured sterams are closed. If you use HttpOp to get an >>(Typed)InputStream, then the caller must close that stream, otherwise it >>does run out of OS resources after a while (1000's of calls). >> >> Andy >> >>On 09/08/13 09:24, Andy Seaborne wrote: >>> The default SystemDefaultHttpClient has a per-route pool of 5 and a >>> system maximum of 10. We do have to be careful of this lock-up >>> possibility. Using DefaultHttpClient directly and setting how we want >>>is >>> probably a better style. >>> >>> I must look more closely at jena-jdbc - how far though it's tests does >>> it get? How many connections have been and gone? >>> >>> The HttpOp in the codebase, when it isn't pased a HttpClient, creates a >>> new one each time and they don't share a pool. The >>> SystemDefaultHttpClient is used once so no chances of a lock-up. >>> >>> It's not what happening in JENA-498 - there, a single threaded tight >>> loop is running for a non-deterministic number of times then causing an >>> exception (seems ot be difefrent on different OS's). >>> >>> There is a chance that Fuseki is not closing it's end properly, or >>> rather early enough, but when I checked the code, it's all down to >>>Jetty >>> and that should be pretty well tested. We run Fuseki for many months >>>at >>> a time. >>> >>> Andy >>> >>> On 09/08/13 00:39, Rob Vesse wrote: >>>> The following may be the culprit in JDBC's case: >>>> >>>> The PoolingClientConnectionManager will allocate connections based on >>>>its >>>> configuration. If all connections for a given route have already been >>>> leased, a request for a connection will block until a connection is >>>> released back to the pool. One can ensure the connection manager does >>>>not >>>> block indefinitely in the connection request operation by setting >>>> 'http.conn-manager.timeout' to a positive value. If the connection >>>> request >>>> cannot be serviced within the given time period >>>> ConnectionPoolTimeoutException will be thrown. >>>> >>>> >>>> So HttpClient will block indefinitely until a connection is >>>> available. We >>>> likely want to turn off that behaviour so that when we hit this state >>>> things get a useful error rather than an infinite hang. >>>> >>>> Rob >>>> >>>> >>>> On 8/8/13 4:11 PM, "Andy Seaborne" <[email protected]> wrote: >>>> >>>>> Maybe related to JENA-498 (many HttpOps overwhelming the system). >>>>> >>>>> But if HttOp uses a shared HttpClient, I was getting lockups. It >>>>>does >>>>> appear to be HTTP error handling (failing to close the input stream >>>>>of >>>>> the response when it's 4xx or 5xx - there may be a body still). >>>>> >>>>> The other part of a shared HttpClient is the authenticator. I >>>>>haven't >>>>> check that yet. I wonder if we need to make it only the HttpClient is >>>>> passed in with a HttpAuthenticator alreay set. The >>>>>DatasetAccessorHttp >>>>> could do that. I haven't check the other uses yet; I doubt it's as >>>>> clear cut for SPARQL Query etc. >>>>> >>>>> With the old code, creating new SystemDefaultHttpClient was not >>>>>giving >>>>> connection pooling and reuse; only a fast loop caused a problem >>>>>(20k-40k >>>>> iterators). >>>>> >>>>> But I don't know why it works on your interval system and not >>>>> AFS/Jenkins. Different versions of ARQ/HttpOp? >>>>> >>>>> Andy >>>>> >>>>> On 08/08/13 23:44, Rob Vesse wrote: >>>>>> Yes the module that hangs is the driver for remote endpoints and >>>>>>stands >>>>>> up >>>>>> a Fuseki server and communicates with it using HTTP which of course >>>>>>now >>>>>> all goes through HttpOp >>>>>> >>>>>> Problem is that I never seem to get an actual exception just hangs >>>>>>on >>>>>> the >>>>>> build server. >>>>>> >>>>>> This might also explain why DEBUG level logging makes the build >>>>>>succeed >>>>>> because HttpClient is very noisy at DEBUG level and all that logging >>>>>> likely introduces the delays in the right parts of the code to allow >>>>>> resources to be freed up. >>>>>> >>>>>> Rob >>>>>> >>>>>> >>>>>> On 8/8/13 3:40 PM, "Andy Seaborne" <[email protected]> wrote: >>>>>> >>>>>>> On 08/08/13 19:42, Rob Vesse wrote: >>>>>>>> So I am officially stumped >>>>>>>> >>>>>>>> Adding the delay still causes the builds to hang so I really don't >>>>>>>> understand why the builds fail on Apache Jenkins. Note that I've >>>>>>>> been >>>>>>>> building the JDBC module on our internal Jenkins server for some >>>>>>>>time >>>>>>>> and >>>>>>>> never had an issue there. Plus the builds run fine on a local >>>>>>>> machine. >>>>>>>> >>>>>>>> If anyone else can take a look or has any suggestions please jump >>>>>>>>in >>>>>>> >>>>>>> <straw-grasping mode> >>>>>>> >>>>>>> Are you using HttpOp? Apache HttpClient? >>>>>>> >>>>>>> I'm fairly certain HttpOp can cause resource starvation by improper >>>>>>> use >>>>>>> of HttpClient. However, I haven't managed to find out where for >>>>>>> certain >>>>>>> [HTTP Exceptions are my current best guess]. (I can perturb the >>>>>>> situation by tweaking pooling numbers.) >>>>>>> >>>>>>> Andy >>>>>>> >>>>>>>> >>>>>>>> Rob >>>>>>>> >>>>>>>> >>>>>>>> On 8/8/13 11:12 AM, "Rob Vesse" <[email protected]> wrote: >>>>>>>> >>>>>>>>> Ok, so turning the log level back down causes the build to go >>>>>>>>> back to >>>>>>>>> failing >>>>>>>>> >>>>>>>>> This starts to look like some kind of timing issue manifesting on >>>>>>>>> the >>>>>>>>> build server causing the tests to get into a hung state. >>>>>>>>>Apparently >>>>>>>>> having the high log level adds sufficient delay into the process >>>>>>>>>to >>>>>>>>> avoid >>>>>>>>> this. >>>>>>>>> >>>>>>>>> My next idea is to simply insert a delay between the tests in >>>>>>>>> question >>>>>>>>> and >>>>>>>>> see if that solves things. >>>>>>>>> >>>>>>>>> Rob >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On 8/8/13 10:55 AM, "Rob Vesse" <[email protected]> wrote: >>>>>>>>> >>>>>>>>>> Ok that is very very weird, after turning up the logging for >>>>>>>>>>that >>>>>>>>>> module >>>>>>>>>> the build ran through to success (and generated a ridiculously >>>>>>>>>> large >>>>>>>>>> log >>>>>>>>>> file at the same time). >>>>>>>>>> >>>>>>>>>> Next step is to try turning down the log level and see if the >>>>>>>>>>build >>>>>>>>>> still >>>>>>>>>> succeeds. >>>>>>>>>> >>>>>>>>>> Rob >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On 8/8/13 10:35 AM, "Rob Vesse" <[email protected]> wrote: >>>>>>>>>> >>>>>>>>>>> The problem is that nothing is blowing up, the build just gets >>>>>>>>>>> stuck >>>>>>>>>>> and >>>>>>>>>>> hangs until the build timeout plugin steps in and aborts the >>>>>>>>>>>build >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> The hang is in the tests for the remote endpoint driver which >>>>>>>>>>>are >>>>>>>>>>> standing >>>>>>>>>>> up Fuseki instances. However if there was some contention for >>>>>>>>>>> ports >>>>>>>>>>> in >>>>>>>>>>> the tests I would expect the tests to just plain fail. >>>>>>>>>>> >>>>>>>>>>> I suspect there may be some deadlock of some sort happening >>>>>>>>>>>when >>>>>>>>>>> running >>>>>>>>>>> the tests on the server but it's hard to tell where/what the >>>>>>>>>>> deadlock >>>>>>>>>>> is. >>>>>>>>>>> I am turning the log level for the tests in question to DEBUG >>>>>>>>>>>and >>>>>>>>>>> will >>>>>>>>>>> re-run a build to see if that yields anything more useful. >>>>>>>>>>> >>>>>>>>>>> Rob >>>>>>>>>>> >>>>>>>>>>> On 8/8/13 6:53 AM, "Andy Seaborne" <[email protected]> wrote: >>>>>>>>>>> >>>>>>>>>>>> On 01/08/13 20:56, Rob Vesse wrote: >>>>>>>>>>>>> I've removed it from the main build for now. For some reason >>>>>>>>>>>>>it >>>>>>>>>>>>> is >>>>>>>>>>>>> getting stuck (but not crashing) on the Apache build server. >>>>>>>>>>>>> This >>>>>>>>>>>>> is >>>>>>>>>>>>> despite it building fine locally and on our internal build >>>>>>>>>>>>> servers. >>>>>>>>>>>>> >>>>>>>>>>>>> Not sure how to proceed on this - is it worth setting up a >>>>>>>>>>>>> separate >>>>>>>>>>>>> build >>>>>>>>>>>>> for JDBC on the Apache build servers to help try and isolate >>>>>>>>>>>>>the >>>>>>>>>>>>> problem? >>>>>>>>>>>>> >>>>>>>>>>>>> Rob >>>>>>>>>>>> >>>>>>>>>>>> What exactly is blowing up? >>>>>>>>>>>> >>>>>>>>>>>> The Apache build servers have all sorts of things on them and >>>>>>>>>>>>a >>>>>>>>>>>> wide >>>>>>>>>>>> range of plugins, which itself can a problem. >>>>>>>>>>>> >>>>>>>>>>>> Andy >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On 8/1/13 11:45 AM, "Rob Vesse" <[email protected]> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> I've moved Jena JDBC from Experimental into Trunk and added >>>>>>>>>>>>>>it >>>>>>>>>>>>>> to >>>>>>>>>>>>>> the >>>>>>>>>>>>>> main build. The builds are a little nosier that some of the >>>>>>>>>>>>>> other >>>>>>>>>>>>>> modules so may want some tweaking to avoid spurious build >>>>>>>>>>>>>> output. >>>>>>>>>>>>>> >>>>>>>>>>>>>> I haven't attempted to figure out how to add it to the >>>>>>>>>>>>>>distro >>>>>>>>>>>>>> because >>>>>>>>>>>>>> I >>>>>>>>>>>>>> know nothing about Maven Assembly plugin >>>>>>>>>>>>>> >>>>>>>>>>>>>> Rob >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >
