[ 
https://issues.apache.org/jira/browse/ANY23-412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hans Brende updated ANY23-412:
------------------------------
    Description: 
Although our DefaultHTTPClient using a "PoolingHttpClientConnectionManager" we 
are unable to use parallelism to take advantage of this, because the 
{{getActualDocumentIRI()}}, {{getContentType()}}, and {{getContentLength()}} 
methods are defined on the actual http client itself, and not on a response 
object, and thus, by the time they are called, their values may have changed as 
a result of a different http client url request. Thus there is no way to 
execute calls in parallel using a single http client.

Background: I ran into this problem while trying to parallelize the online 
microdata tests (cf. ANY23-67) for speed, using a single Any23 instance to 
extract from multiple pages simultaneously. Usually, the tests would pass, but 
sporadically, they would fail as a result of the document IRI *not matching the 
page the triples were extracted from*. I had to work around this by using a 
different Any23 instance (and thus a different http client) for every single 
request.

  was:Although our DefaultHTTPClient using a 
"PoolingHttpClientConnectionManager" we are unable to use parallelism to take 
advantage of this, because the {{getActualDocumentIRI()}}, 
{{getContentType()}}, and {{getContentLength()}} methods are defined on the 
actual http client itself, and not on a response object, and thus, by the time 
they are called, their values may have changed as a result of a different http 
client url request. Thus there is no way to execute calls in parallel using a 
single http client.


> HTTPClient API does not allow parallelism
> -----------------------------------------
>
>                 Key: ANY23-412
>                 URL: https://issues.apache.org/jira/browse/ANY23-412
>             Project: Apache Any23
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 2.3
>            Reporter: Hans Brende
>            Priority: Major
>             Fix For: 2.3
>
>
> Although our DefaultHTTPClient using a "PoolingHttpClientConnectionManager" 
> we are unable to use parallelism to take advantage of this, because the 
> {{getActualDocumentIRI()}}, {{getContentType()}}, and {{getContentLength()}} 
> methods are defined on the actual http client itself, and not on a response 
> object, and thus, by the time they are called, their values may have changed 
> as a result of a different http client url request. Thus there is no way to 
> execute calls in parallel using a single http client.
> Background: I ran into this problem while trying to parallelize the online 
> microdata tests (cf. ANY23-67) for speed, using a single Any23 instance to 
> extract from multiple pages simultaneously. Usually, the tests would pass, 
> but sporadically, they would fail as a result of the document IRI *not 
> matching the page the triples were extracted from*. I had to work around this 
> by using a different Any23 instance (and thus a different http client) for 
> every single request.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to