[
https://issues.apache.org/jira/browse/ANY23-412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Hans Brende updated ANY23-412:
------------------------------
Description:
Although our DefaultHTTPClient using a "PoolingHttpClientConnectionManager" we
are unable to use parallelism to take advantage of this, because the
{{getActualDocumentIRI()}}, {{getContentType()}}, and {{getContentLength()}}
methods are defined on the actual http client itself, and not on a response
object, and thus, by the time they are called, their values may have changed as
a result of a different http client url request. Thus there is no way to
execute calls in parallel using a single http client.
Background: I ran into this problem while trying to parallelize the online
microdata tests (cf. ANY23-67) for speed, using a single Any23 instance to
extract from multiple pages simultaneously. Usually, the tests would pass, but
sporadically, they would fail as a result of the document IRI *not matching the
page the triples were extracted from*. I had to work around this by using a
different Any23 instance (and thus a different http client) for every single
request.
was:Although our DefaultHTTPClient using a
"PoolingHttpClientConnectionManager" we are unable to use parallelism to take
advantage of this, because the {{getActualDocumentIRI()}},
{{getContentType()}}, and {{getContentLength()}} methods are defined on the
actual http client itself, and not on a response object, and thus, by the time
they are called, their values may have changed as a result of a different http
client url request. Thus there is no way to execute calls in parallel using a
single http client.
> HTTPClient API does not allow parallelism
> -----------------------------------------
>
> Key: ANY23-412
> URL: https://issues.apache.org/jira/browse/ANY23-412
> Project: Apache Any23
> Issue Type: Bug
> Components: core
> Affects Versions: 2.3
> Reporter: Hans Brende
> Priority: Major
> Fix For: 2.3
>
>
> Although our DefaultHTTPClient using a "PoolingHttpClientConnectionManager"
> we are unable to use parallelism to take advantage of this, because the
> {{getActualDocumentIRI()}}, {{getContentType()}}, and {{getContentLength()}}
> methods are defined on the actual http client itself, and not on a response
> object, and thus, by the time they are called, their values may have changed
> as a result of a different http client url request. Thus there is no way to
> execute calls in parallel using a single http client.
> Background: I ran into this problem while trying to parallelize the online
> microdata tests (cf. ANY23-67) for speed, using a single Any23 instance to
> extract from multiple pages simultaneously. Usually, the tests would pass,
> but sporadically, they would fail as a result of the document IRI *not
> matching the page the triples were extracted from*. I had to work around this
> by using a different Any23 instance (and thus a different http client) for
> every single request.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)