incubator-airflow git commit: [AIRFLOW-1714] Fix misspelling: s/seperate/separate/

criccomini Fri, 13 Oct 2017 10:49:14 -0700

Repository: incubator-airflow
Updated Branches:
  refs/heads/master 9f2c16a0a -> 4dade6db3



[AIRFLOW-1714] Fix misspelling: s/seperate/separate/

Closes #2688 from wrp/separate


Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/4dade6db
Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/4dade6db
Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/4dade6db

Branch: refs/heads/master
Commit: 4dade6db3d2dc03f9f44dc080acae37ebf03a4fb
Parents: 9f2c16a
Author: William Pursell <[email protected]>
Authored: Fri Oct 13 10:46:57 2017 -0700
Committer: Chris Riccomini <[email protected]>
Committed: Fri Oct 13 10:46:57 2017 -0700

----------------------------------------------------------------------
 airflow/contrib/example_dags/example_twitter_README.md | 4 ++--
 airflow/contrib/example_dags/example_twitter_dag.py    | 2 +-
 airflow/contrib/hooks/salesforce_hook.py               | 4 ++--
 airflow/www/views.py                                   | 2 +-
 4 files changed, 6 insertions(+), 6 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/4dade6db/airflow/contrib/example_dags/example_twitter_README.md
----------------------------------------------------------------------
diff --git a/airflow/contrib/example_dags/example_twitter_README.md 
b/airflow/contrib/example_dags/example_twitter_README.md
index 03ecc66..319eac3 100644
--- a/airflow/contrib/example_dags/example_twitter_README.md
+++ b/airflow/contrib/example_dags/example_twitter_README.md
@@ -18,7 +18,7 @@
 
 ***Example Structure:*** In this example dag, we are collecting tweets for 
four users account or twitter handle. Each twitter handle has two channels, 
incoming tweets and outgoing tweets. Hence, in this example, by running the 
fetch_tweet task, we should have eight output files. For better management, 
each of the eight output files should be saved with the yesterday's date (we 
are collecting tweets from yesterday), i.e. toTwitter_A_2016-03-21.csv. We are 
using three kind of operators: PythonOperator, BashOperator, and HiveOperator. 
However, for this example only the Python scripts are stored externally. Hence 
this example DAG only has the following directory structure:
 
-The python functions here are just placeholders. In case you are interested to 
actually make this DAG fully functional, first start with filling out the 
scripts as seperate files and importing them into the DAG with absolute or 
relative import. My approach was to store the retrieved data in memory using 
Pandas dataframe first, and then use the built in method to save the CSV file 
on hard-disk.
+The python functions here are just placeholders. In case you are interested to 
actually make this DAG fully functional, first start with filling out the 
scripts as separate files and importing them into the DAG with absolute or 
relative import. My approach was to store the retrieved data in memory using 
Pandas dataframe first, and then use the built in method to save the CSV file 
on hard-disk.
 The eight different CSV files are then put into eight different folders within 
HDFS. Each of the newly inserted files are then loaded into eight different 
external hive tables. Hive tables can be external or internal. In this case, we 
are inserting the data right into the table, and so we are making our tables 
internal. Each file is inserted into the respected Hive table named after the 
twitter channel, i.e. toTwitter_A or fromTwitter_A. It is also important to 
note that when we created the tables, we facilitated for partitioning by date 
using the variable dt and declared comma as the row deliminator. The 
partitioning is very handy and ensures our query execution time remains 
constant even with growing volume of data.
 As most probably these folders and hive tables doesn't exist in your system, 
you will get an error for these tasks within the DAG. If you rebuild a function 
DAG from this example, make sure those folders and hive tables exists. When you 
create the table, keep the consideration of table partitioning and declaring 
comma as the row deliminator in your mind. Furthermore, you may also need to 
skip headers on each read and ensure that the user under which you have Airflow 
running has the right permission access. Below is a sample HQL snippet on 
creating such table:
 ```
@@ -29,7 +29,7 @@ CREATE TABLE toTwitter_A(id BIGINT, id_str STRING
                                                STORED AS TEXTFILE;
                                                alter table toTwitter_A SET 
serdeproperties ('skip.header.line.count' = '1');
 ```
-When you review the code for the DAG, you will notice that these tasks are 
generated using for loop. These two for loops could be combined into one loop. 
However, in most cases, you will be running different analysis on your incoming 
incoming and outgoing tweets, and hence they are kept seperated in this example.
+When you review the code for the DAG, you will notice that these tasks are 
generated using for loop. These two for loops could be combined into one loop. 
However, in most cases, you will be running different analysis on your incoming 
incoming and outgoing tweets, and hence they are kept separated in this example.
 Final step is a running the broker script, brokerapi.py, which will run 
queries in Hive and store the summarized data to MySQL in our case. To connect 
to Hive, pyhs2 library is extremely useful and easy to use. To insert data into 
MySQL from Python, sqlalchemy is also a good one to use.
 I hope you find this tutorial useful. If you have question feel free to ask me 
on [Twitter](https://twitter.com/EkhtiarSyed) or via the live Airflow chatroom 
room in [Gitter](https://gitter.im/airbnb/airflow).<p>
 -Ekhtiar Syed

http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/4dade6db/airflow/contrib/example_dags/example_twitter_dag.py
----------------------------------------------------------------------
diff --git a/airflow/contrib/example_dags/example_twitter_dag.py 
b/airflow/contrib/example_dags/example_twitter_dag.py
index d8f89d3..2205b9a 100644
--- a/airflow/contrib/example_dags/example_twitter_dag.py
+++ b/airflow/contrib/example_dags/example_twitter_dag.py
@@ -127,7 +127,7 @@ hive_to_mysql = PythonOperator(
 # csv files to HDFS. The second task loads these files from HDFS to respected 
Hive
 # tables. These two for loops could be combined into one loop. However, in 
most cases,
 # you will be running different analysis on your incoming incoming and 
outgoing tweets,
-# and hence they are kept seperated in this example.
+# and hence they are kept separated in this example.
 # 
--------------------------------------------------------------------------------
 
 from_channels = ['fromTwitter_A', 'fromTwitter_B', 'fromTwitter_C', 
'fromTwitter_D']

http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/4dade6db/airflow/contrib/hooks/salesforce_hook.py
----------------------------------------------------------------------
diff --git a/airflow/contrib/hooks/salesforce_hook.py 
b/airflow/contrib/hooks/salesforce_hook.py
index 0d0a104..b82e4ca 100644
--- a/airflow/contrib/hooks/salesforce_hook.py
+++ b/airflow/contrib/hooks/salesforce_hook.py
@@ -130,7 +130,7 @@ class SalesforceHook(BaseHook, LoggingMixin):
         return [f['name'] for f in desc['fields']]
 
     def _build_field_list(self, fields):
-        # join all of the fields in a comma seperated list
+        # join all of the fields in a comma separated list
         return ",".join(fields)
 
     def get_object_from_salesforce(self, obj, fields):
@@ -204,7 +204,7 @@ class SalesforceHook(BaseHook, LoggingMixin):
 
         Acceptable formats are:
             - csv:
-                comma-seperated-values file.  This is the default format.
+                comma-separated-values file.  This is the default format.
             - json:
                 JSON array.  Each element in the array is a different row.
             - ndjson:

http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/4dade6db/airflow/www/views.py
----------------------------------------------------------------------
diff --git a/airflow/www/views.py b/airflow/www/views.py
index 81ee61f..9ae93f6 100644
--- a/airflow/www/views.py
+++ b/airflow/www/views.py
@@ -2612,7 +2612,7 @@ class ConnectionModelView(wwwutils.SuperUserMixin, 
AirflowModelView):
         'extra__google_cloud_platform__project': StringField('Project Id'),
         'extra__google_cloud_platform__key_path': StringField('Keyfile Path'),
         'extra__google_cloud_platform__keyfile_dict': PasswordField('Keyfile 
JSON'),
-        'extra__google_cloud_platform__scope': StringField('Scopes (comma 
seperated)'),
+        'extra__google_cloud_platform__scope': StringField('Scopes (comma 
separated)'),
 
     }
     form_choices = {

incubator-airflow git commit: [AIRFLOW-1714] Fix misspelling: s/seperate/separate/

Reply via email to