[
https://issues.apache.org/jira/browse/AIRFLOW-2382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Kengo Seki reassigned AIRFLOW-2382:
-----------------------------------
Assignee: Kengo Seki
> Fix wrong description for delimiter
> -----------------------------------
>
> Key: AIRFLOW-2382
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2382
> Project: Apache Airflow
> Issue Type: Bug
> Components: aws, operators
> Reporter: Kengo Seki
> Assignee: Kengo Seki
> Priority: Major
>
> The document for S3ListOperator says:
> {code}
> :param delimiter: The delimiter by which you want to filter the objects.
> For e.g to lists the CSV files from in a directory in S3 you would use
> delimiter='.csv'.
> {code}
> {code}
> **Example**:
> The following operator would list all the CSV files from the S3
> ``customers/2018/04/`` key in the ``data`` bucket. ::
> s3_file = S3ListOperator(
> task_id='list_3s_files',
> bucket='data',
> prefix='customers/2018/04/',
> delimiter='.csv',
> aws_conn_id='aws_customers_conn'
> )
> {code}
> but it actually behaves oppositely:
> {code}
> In [1]: from airflow.contrib.operators.s3_list_operator import S3ListOperator
> In [2]: S3ListOperator(task_id='t', bucket='bkt0', prefix='',
> aws_conn_id='s3').execute(None)
> [2018-04-26 10:34:27,001] {connectionpool.py:735} INFO - Starting new HTTPS
> connection (1): bkt0.s3.amazonaws.com
> [2018-04-26 10:34:27,711] {connectionpool.py:735} INFO - Starting new HTTPS
> connection (1): bkt0.s3-ap-northeast-1.amazonaws.com
> [2018-04-26 10:34:27,801] {connectionpool.py:735} INFO - Starting new HTTPS
> connection (1): bkt0.s3.ap-northeast-1.amazonaws.com
> Out[2]: ['0.csv', '1.txt', '2.jpg', '3.exe']
> In [3]: S3ListOperator(task_id='t', bucket='bkt0', prefix='',
> aws_conn_id='s3', delimiter='.csv').execute(None)
> [2018-04-26 10:34:39,722] {connectionpool.py:735} INFO - Starting new HTTPS
> connection (1): bkt0.s3.amazonaws.com
> [2018-04-26 10:34:40,483] {connectionpool.py:735} INFO - Starting new HTTPS
> connection (1): bkt0.s3-ap-northeast-1.amazonaws.com
> [2018-04-26 10:34:40,569] {connectionpool.py:735} INFO - Starting new HTTPS
> connection (1): bkt0.s3.ap-northeast-1.amazonaws.com
> Out[3]: ['1.txt', '2.jpg', '3.exe']
> {code}
> This is because that the 'delimiter' parameter is for representing path
> hierarchy (so '/' is used typically), not file extension. Also
> S3ToGoogleCloudStorageOperator has the same problem.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)