[
https://issues.apache.org/jira/browse/AIRFLOW-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Galak updated AIRFLOW-2010:
---------------------------
Description:
HttpHook is using request module to perform http/https calls. but it is hidden
inside implementation. Therefore, it is not possible to choose any value for
_pool_connections_ or _pool_maxsize_ parameters, defaulting to 10. (see
[request module
documentation|http://docs.python-requests.org/en/latest/api/#lower-lower-level-classes])
_{{requests.adapters.HTTPAdapter}}_ parameters could probably be passed through
Airflow Connection extra parameters ?
As a consequence, calling a REST API concurrently (using
[ThreadPoolExecutor|https://docs.python.org/3/library/concurrent.futures.html#threadpoolexecutor])
is limited to 10 workers maximum. Each additional worker is stopped with the
following warning:
{quote}
{{\{connectionpool.py\} WARNING - Connection pool is full, discarding
connection: my.api.example.org}}
{quote}
See [this question on
stackoverflow|https://stackoverflow.com/questions/23632794/in-requests-library-how-can-i-avoid-httpconnectionpool-is-full-discarding-con]
about Http connexion pools configuration
was:
HttpHook is using request module to perform http/https calls. but it is hidden
inside implementation. Therefore, it is not possible to choose any value for
_pool_connections_ or _pool_maxsize_ parameters, defaulting to 10. (see
[request module
documentation|http://docs.python-requests.org/en/latest/api/#lower-lower-level-classes])
_{{requests.adapters.HTTPAdapter}}_ parameters could probably be passed through
Airflow Connection extra parameters ?
As a consequence, calling a REST API concurrently (using
[ThreadPoolExecutor|https://docs.python.org/3/library/concurrent.futures.html#threadpoolexecutor])
is limited to 10 workers maximum. Each additional worker is stopped with the
following warning:
{quote}
{{ \{connectionpool.py\} WARNING - Connection pool is full, discarding
connection: my.api.example.org }}
{quote}
See [this question on
stackoverflow|https://stackoverflow.com/questions/23632794/in-requests-library-how-can-i-avoid-httpconnectionpool-is-full-discarding-con]
about Http connexion pools configuration
> Make HttpHook inner connection pool configurable
> ------------------------------------------------
>
> Key: AIRFLOW-2010
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2010
> Project: Apache Airflow
> Issue Type: Improvement
> Components: hooks
> Affects Versions: 1.8.0
> Reporter: Galak
> Priority: Major
>
> HttpHook is using request module to perform http/https calls. but it is
> hidden inside implementation. Therefore, it is not possible to choose any
> value for _pool_connections_ or _pool_maxsize_ parameters, defaulting to 10.
> (see [request module
> documentation|http://docs.python-requests.org/en/latest/api/#lower-lower-level-classes])
> _{{requests.adapters.HTTPAdapter}}_ parameters could probably be passed
> through Airflow Connection extra parameters ?
> As a consequence, calling a REST API concurrently (using
> [ThreadPoolExecutor|https://docs.python.org/3/library/concurrent.futures.html#threadpoolexecutor])
> is limited to 10 workers maximum. Each additional worker is stopped with the
> following warning:
> {quote}
> {{\{connectionpool.py\} WARNING - Connection pool is full, discarding
> connection: my.api.example.org}}
> {quote}
> See [this question on
> stackoverflow|https://stackoverflow.com/questions/23632794/in-requests-library-how-can-i-avoid-httpconnectionpool-is-full-discarding-con]
> about Http connexion pools configuration
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)