natanweinberger commented on a change in pull request #15425: URL: https://github.com/apache/airflow/pull/15425#discussion_r633846608
########## File path: airflow/utils/parse.py ########## @@ -0,0 +1,168 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# + +"""Parse data from a file if it uses a valid format.""" +import json +import logging +import os +from collections import defaultdict +from json import JSONDecodeError +from typing import Any, Dict, List, Tuple + +import airflow.utils.yaml as yaml +from airflow.exceptions import AirflowException, AirflowFileParseException, FileSyntaxError +from airflow.utils.file import COMMENT_PATTERN + +log = logging.getLogger(__name__) + + +def _parse_env_file(file_path: str) -> Tuple[Dict[str, List[str]], List[FileSyntaxError]]: + """ + Parse a file in the ``.env`` format. + + .. code-block:: text + + MY_CONN_ID=my-conn-type://my-login:my-pa%2Fssword@my-host:5432/my-schema?param1=val1¶m2=val2 + + :param file_path: The location of the file that will be processed. + :type file_path: str + :return: Tuple with mapping of key and list of values and list of syntax errors + """ + with open(file_path) as f: + content = f.read() + + contents_dict: Dict[str, List[str]] = defaultdict(list) + errors: List[FileSyntaxError] = [] + for line_no, line in enumerate(content.splitlines(), 1): + if not line: + # Ignore empty line + continue Review comment: I've been poking around on this and running into an issue with the existing tests. The tests are using `unittest.mock.mock_open` to patch the call to `open` ([example](https://github.com/apache/airflow/blob/master/tests/secrets/test_local_filesystem.py#L36)). This works when you want to mock reading data from a file using `read()` or `readlines()`, which is the existing implementation. However, `unittest.mock.mock_open` doesn't add compatibility for iterating over the lines of a file lazily until version 3.8. So, the `__iter__` method doesn't properly allow us to read the lines lazily in tests. See here: https://docs.python.org/3/library/unittest.mock.html#mock-open ``` Changed in version 3.8: Added __iter__() to implementation so that iteration (such as in for loops) correctly consumes read_data. ``` Given that this is a minor enhancement, I'd suggest punting on it for the time being. It will expand the scope of this PR and should be done separately. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
