Ottomata has submitted this change and it was merged.
Change subject: Put refined datsets definition in separate file
......................................................................
Put refined datsets definition in separate file
The addition of the variable $webrequest_refined_data_directory was causing
existing oozie properties to fail, since there is no default value
for this directory.
Once the refined dataset is considered stable, I'd like to refactor a little
bit of this. Particularly, the 'refined' dataset should be the main one,
and the raw ones should be refered to as 'raw'.
Change-Id: I9337490755355a97873509ff53e222a0f0db80c2
---
M oozie/webrequest/datasets.xml
A oozie/webrequest/datasets_refined.xml
2 files changed, 37 insertions(+), 25 deletions(-)
Approvals:
Ottomata: Verified; Looks good to me, approved
Nuria: Looks good to me, but someone else must approve
diff --git a/oozie/webrequest/datasets.xml b/oozie/webrequest/datasets.xml
index 5d12aff..7056ac5 100644
--- a/oozie/webrequest/datasets.xml
+++ b/oozie/webrequest/datasets.xml
@@ -7,8 +7,6 @@
Example: 2014-04-01T00:00Z
${webrequest_data_directory} - Path to directory where data is time
bucketed.
Example: /wmf/data/raw/webrequest
- ${webrequest_refined_data_directory} - Path to directory where refined
data is time bucketed.
- Example: /wmf/data/wmf/webrequest
-->
<datasets>
@@ -81,29 +79,6 @@
timezone="Universal">
<uri-template>${webrequest_data_directory}/webrequest_upload/hourly/${YEAR}/${MONTH}/${DAY}/${HOUR}</uri-template>
<done-flag>_SUCCESS</done-flag>
- </dataset>
-
- <!--
- The webrequest_*_refined datasets contain the same data as the
- above two 'raw' datasets, except that they use a more efficient
- storage format, and contain extra information.
-
- This dataset does not yet include upload or bits.
-
- TODO: I would like to eventually name this data set 'webrequest_mobile',
- etc. and rename the above dataset to webrequest_mobile_raw, etc.
- -->
- <dataset name="webrequest_mobile_refined"
- frequency="${coord:hours(1)}"
- initial-instance="${start_time}"
- timezone="Universal">
-
<uri-template>${webrequest_refined_data_directory}/webrequest_source=mobile/year=${YEAR}/month=${MONTH}/day=${DAY}/hour=${HOUR}</uri-template>
- </dataset>
- <dataset name="webrequest_text_refined"
- frequency="${coord:hours(1)}"
- initial-instance="${start_time}"
- timezone="Universal">
-
<uri-template>${webrequest_refined_data_directory}/webrequest_source=upload/year=${YEAR}/month=${MONTH}/day=${DAY}/hour=${HOUR}</uri-template>
</dataset>
</datasets>
diff --git a/oozie/webrequest/datasets_refined.xml
b/oozie/webrequest/datasets_refined.xml
new file mode 100644
index 0000000..df9b42a
--- /dev/null
+++ b/oozie/webrequest/datasets_refined.xml
@@ -0,0 +1,37 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!--
+Defines reusable datasets for refined webrequest data.
+Use this dataset in your coordinator.xml files by setting:
+
+ ${start_time} - the initial instance of your data.
+ Example: 2014-04-01T00:00Z
+ ${webrequest_refined_data_directory} - Path to directory where refined
data is time bucketed.
+ Example: /wmf/data/wmf/webrequest
+-->
+
+<datasets>
+
+ <!--
+ The webrequest_*_refined datasets contain the same data as the
+ above two 'raw' datasets, except that they use a more efficient
+ storage format, and contain extra information.
+
+ This dataset does not yet include upload or bits.
+
+ TODO: I would like to eventually name this data set 'webrequest_mobile',
+ etc. and rename the datasets.xml datasets to webrequest_mobile_raw,
etc.
+ -->
+ <dataset name="webrequest_mobile_refined"
+ frequency="${coord:hours(1)}"
+ initial-instance="${start_time}"
+ timezone="Universal">
+
<uri-template>${webrequest_refined_data_directory}/webrequest_source=mobile/year=${YEAR}/month=${MONTH}/day=${DAY}/hour=${HOUR}</uri-template>
+ </dataset>
+ <dataset name="webrequest_text_refined"
+ frequency="${coord:hours(1)}"
+ initial-instance="${start_time}"
+ timezone="Universal">
+
<uri-template>${webrequest_refined_data_directory}/webrequest_source=upload/year=${YEAR}/month=${MONTH}/day=${DAY}/hour=${HOUR}</uri-template>
+ </dataset>
+
+</datasets>
--
To view, visit https://gerrit.wikimedia.org/r/183988
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings
Gerrit-MessageType: merged
Gerrit-Change-Id: I9337490755355a97873509ff53e222a0f0db80c2
Gerrit-PatchSet: 1
Gerrit-Project: analytics/refinery
Gerrit-Branch: master
Gerrit-Owner: Ottomata <[email protected]>
Gerrit-Reviewer: Nuria <[email protected]>
Gerrit-Reviewer: Ottomata <[email protected]>
_______________________________________________
MediaWiki-commits mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits