http://git-wip-us.apache.org/repos/asf/incubator-trafodion/blob/da748b4d/docs/sql_reference/src/asciidoc/_chapters/sql_clauses.adoc ---------------------------------------------------------------------- diff --git a/docs/sql_reference/src/asciidoc/_chapters/sql_clauses.adoc b/docs/sql_reference/src/asciidoc/_chapters/sql_clauses.adoc index dbe39a3..450dd9f 100644 --- a/docs/sql_reference/src/asciidoc/_chapters/sql_clauses.adoc +++ b/docs/sql_reference/src/asciidoc/_chapters/sql_clauses.adoc @@ -1,1432 +1,1432 @@ -//// -/** -* @@@ START COPYRIGHT @@@ -* -* Licensed to the Apache Software Foundation (ASF) under one -* or more contributor license agreements. See the NOTICE file -* distributed with this work for additional information -* regarding copyright ownership. The ASF licenses this file -* to you under the Apache License, Version 2.0 (the -* "License"); you may not use this file except in compliance -* with the License. You may obtain a copy of the License at -* -* http://www.apache.org/licenses/LICENSE-2.0 -* -* Unless required by applicable law or agreed to in writing, -* software distributed under the License is distributed on an -* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -* KIND, either express or implied. See the License for the -* specific language governing permissions and limitations -* under the License. -* -* @@@ END COPYRIGHT @@@ -*/ -//// - -[[sql_clauses]] -= SQL Clauses - -Clauses are used by {project-name} SQL statements to specify default values, -ways to sample or sort data, how to store physical data, and other -details. - -This section describes: - -* <<default_clause,DEFAULT Clause>> specifies a default value for a column being created. -* <<format_clause,FORMAT Clause>> specifies the format to use. -* <<sample_clause,SAMPLE Clause>> specifies the sampling method used to select a subset of the intermediate result table of a SELECT statement. -* <<sequence_by_clause,SEQUENCE BY Clause>> specifies the order in which to sort rows of the intermediate result table for calculating sequence functions. -* <<transpose_clause,TRANSPOSE Clause>> generates, for each row of the SELECT source table, a row for each item in the transpose item list. - -[[default_clause]] -== DEFAULT Clause - -The DEFAULT option of the CREATE TABLE or ALTER TABLE _table-name_ ADD -COLUMN statement specifies a default value for a column being created. - -The default value is used when a row is inserted in the table without a value for the column. - -``` -DEFAULT default | NO DEFAULT - -default is: - literal -| NULL -| CURRENTDATE -| CURRENTTIME -| CURRENTTIMESTAMP -``` - -* `NO DEFAULT` -+ -specifies the column has no default value. You cannot specify NO DEFAULT -in an ALTER TABLE statement. See <<alter_table_statement,ALTER TABLE Statement>>. - -[[syntax_for_default_clause]] -=== Syntax for Default Clause - -* `DEFAULT _literal_` -+ -is a literal of a data type compatible with the data type of the -associated column. -+ -For a character column, _literal_ must be a string literal of no more -than 240 characters or the length of the column, whichever is less. The -maximum length of a default value for a character column is 240 bytes -(minus control characters) or the length of the column, whichever is -less. Control characters consist of character set prefixes and single -quote delimiter found in the text itself. -+ -For a numeric column, _literal_ must be a numeric literal that does not -exceed the defined length of the column. The number of digits to the -right of the decimal point must not exceed the scale of the column, and -the number of digits to the left of the decimal point must not exceed -the number in the length (or length minus scale, if you specified scale -for the column). -+ -For a datetime column, _literal_ must be a datetime literal with a -precision that matches the precision of the column. -+ -For an INTERVAL column, _literal_ must be an INTERVAL literal that has -the range of INTERVAL fields defined for the column. - -* `DEFAULT NULL` -+ -specifies NULL as the default. This default can occur only with a column -that allows null. - -* `DEFAULT CURRENT_DATE` -+ -specifies the default value for the column as the value returned by the -CURRENT_DATE function at the time of the operation that assigns a value -to the column. This default can occur only with a column whose data type -is DATE. - -* `DEFAULT CURRENT_TIME` -+ -specifies the default value for the column as the value returned by the -CURRENT_TIME function at the time of the operation that assigns a value -to the column. This default can occur only with a column whose data type -is TIME. - -* `DEFAULT CURRENT_TIMESTAMP` -+ -specifies the default value for the column as the value returned by the -CURRENT_TIMESTAMP function at the time of the operation that assigns a -value to the column. This default can occur only with a column whose -data type is TIMESTAMP. - -[[examples_of_default]] -=== Examples of DEFAULT - -* This example uses DEFAULT clauses on CREATE TABLE to specify default column values: -+ -``` -CREATE TABLE items -( item_id CHAR(12) NO DEFAULT -, description CHAR(50) DEFAULT NULL -, num_on_hand INTEGER DEFAULT 0 NOT NULL -) ; -``` - -* This example uses DEFAULT clauses on CREATE TABLE to specify default column values: -+ -``` -CREATE TABLE persnl.project -( projcode NUMERIC (4) UNSIGNED NO DEFAULT NOT NULL -, empnum NUMERIC (4) UNSIGNED NO DEFAULT NOT NULL -, projdesc VARCHAR (18) DEFAULT NULL -, start_date DATE DEFAULT CURRENT_DATE -, ship_timestamp TIMESTAMP DEFAULT CURRENT_TIMESTAMP -, est_complete INTERVAL DAY DEFAULT INTERVAL '30' DAY -, PRIMARY KEY (projcode) -) ; -``` - -<<< -[[format_clause]] -== FORMAT Clause - -The FORMAT clause specifies the output format for DATE values. It can -also be used to specify the length of character output or to specify -separating the digits of integer output with colons. - -* Date Formats: -+ -``` -(FORMAT 'format-string') | - -(DATE, FORMAT 'format-string') - -format-string for Date Formats is: - YYYY-MM-DD - MM/DD/YYYY - YY/MM/DD - YYYY/MM/DD - YYYYMMDD - DD.MM.YYYY - DD-MM-YYYY - DD-MMM-YYYY -``` - -* Other Formats: -+ -``` -(FORMAT 'format-string') - -format-string for other formats is: - XXX - 99:99:99:99 - -99:99:99:99 -``` - -* `YYYY-MM-DD` -+ -specifies that the FORMAT clause output format is _year-month-day_. - -* `MM/DD/YYYY` -+ -specifies that the FORMAT clause output format is _month/day/year_ - -* `YY/MM/DD` -+ -specifies that the FORMAT clause output format is _year/month/day_. - -* `YYYY/MM/DD` -+ -specifies that the FORMAT clause output format is _year/month/day_. - -* `YYYYMMDD` -+ -specifies that the FORMAT clause output format is _yearmonthday_. - -* `DD.MM.YYYY` -+ -specifies that the FORMAT clause output format is _day.month.year_. - -* `DD-MM-YYYY` -+ -specifies that the FORMAT clause output format is _day-month-year_. - -* `DD-MMM-YYYY` -+ -specifies that the FORMAT clause output format is _day-month-year_. - -* `XXX` -+ -specifies that the FORMAT clause output format is a string format. The -input must be a numeric or string value. - -* `99:99:99:99` -+ -specifies that the FORMAT clause output format is a timestamp. The input -must be a numeric value. - -* `-99:99:99:99` -+ -specifies that the FORMAT clause output format is a timestamp. The input -must be a numeric value. - -[[considerations_for_date_formats]] -=== Considerations for Date Formats - -The expression preceding the (FORMAT â_format-string_') clause must be -a DATE value. - -The expression preceding the (DATE, FORMAT _'format-string_') clause -must be a quoted string in the USA, EUROPEAN, or DEFAULT date format. - -[[considerations_for_other_formats]] -==== Considerations for Other Formats - -For XXX, the expression preceding the (FORMAT _'format-string_') -clause must be a numeric value or a string value. - -For 99:99:99:99 and -99:99:99:99, the expression preceding the (FORMAT -_'format-string_') clause must be a numeric value. - -[[examples_of_format]] -=== Examples of FORMAT - -* The format string 'XXX' in this example will yield a sample result of abc: -+ -``` -SELECT 'abcde' (FORMAT 'XXX') FROM (VALUES(1)) t; -``` - -* The format string 'YYYY-MM_DD' in this example will yield a sample result of 2008-07-17. -+ -``` -SELECT CAST('2008-07-17' AS DATE) (FORMAT 'YYYY-MM-DD') FROM (VALUES(1)) t; -``` - -* The format string 'MM/DD/YYYY' in this example will yield a sample result of 07/17/2008. -+ -``` -SELECT '2008-07-17' (DATE, FORMAT 'MM/DD/YYYY') FROM (VALUES(1)) t; -``` - -* The format string 'YY/MM/DD' in this example will yield a sample result of 08/07/17. -+ -``` -SELECT '2008-07-17'(DATE, FORMAT 'YY/MM/DD') FROM (VALUES(1)) t; -``` - -* The format string 'YYYY/MM/DD' in this example will yield a sample result of 2008/07/17. -+ -``` -SELECT '2008-07-17' (DATE, FORMAT 'YYYY/MM/DD') FROM (VALUES(1)) t; -``` - -* The format string 'YYYYMMDD' in this example will yield a sample result`of 20080717. -+ -``` -SELECT '2008-07-17' (DATE, FORMAT 'YYYYMMDD') FROM (VALUES(1)) t; -``` - -* The format string 'DD.MM.YYYY' in this example will yield a sample result of 17.07.2008. -+ -``` -SELECT '2008-07-17' (DATE, FORMAT 'DD.MM.YYYY') FROM (VALUES(1)) t; -``` - -* The format string 'DD-MMM-YYYY' in this example will yield a sample result of 17âJUL-2008. -+ -``` -SELECT '2008-07-17' (DATE, FORMAT 'DD-MMM-YYYY') FROM (VALUES(1)) t; -``` - -* The format string '99:99:99:99' in this example will yield a sample result of 12:34:56:78. -+ -``` -SELECT 12345678 (FORMAT '99:99:99:99') FROM (VALUES(1)) t; -``` - -* The format string '-99:99:99:99' in this example will yield a sample result of -12:34:56:78. -+ -``` -SELECT (-12345678) (FORMAT '-99:99:99:99') FROM (VALUES(1)) t; -``` - -<<< -[[sample_clause]] -== SAMPLE Clause - -The SAMPLE clause of the SELECT statement specifies the sampling method -used to select a subset of the intermediate result table of a SELECT -statement. The intermediate result table consists of the rows returned -by a WHERE clause or, if no WHERE clause exists, the FROM clause. See -<<select_statement,SELECT Statement>>. - -SAMPLE is a {project-name} SQL extension. - -``` -SAMPLE sampling-methodis: - RANDOM percent-size -| FIRST rows-size - [SORT BY colname [ASC[ENDING]|DESC[ENDING]] - [,colname [ASC[ENDING] | DESC[ENDING]]]...] -| PERIODIC rows-size EVERY number-rows ROWS - [SORT BY colname [ASC[ENDING] | DESC[ENDING]] - [,colname [ASC[ENDING] | DESC[ENDING]]]...] - -percent-size is: - percent-result PERCENT [ROWS] -| BALANCE WHEN condition - THEN percent-result PERCENT [ROWS] - [WHEN condition THEN percent-result PERCENT [ROWS]]... - [ELSE percent-result PERCENT [ROWS]] END - -rows-size is: - number-rows ROWS -| BALANCE WHEN condition THEN number-rows ROWS - [WHEN condition THEN number-rows ROWS]... - [ELSE number-rows ROWS] END -``` - -* `RANDOM _percent-size_` -+ -directs {project-name} SQL to choose rows randomly (each row having an -unbiased probability of being chosen) without replacement from the -result table. The sampling size is determined by the _percent-size_, -defined as: - -* `_percent-result_ PERCENT [ROWS] | BALANCE WHEN _condition_ THEN -_percent-result_ PERCENT [ROWS] [WHEN _condition_ THEN _percent-result_ -PERCENT [ROWS]]… [ELSE _percent-result_ PERCENT [ROWS]] END` -+ -specifies the value of the size for RANDOM sampling by using a percent -of the result table. The value _percent-result_ must be a numeric -literal. -+ -You can determine the actual size of the sample. Suppose that _N_ rows -exist in the intermediate result table. Each row is picked with a -probability of _r_%, where _r_ is the sample size in PERCENT. -Therefore, the actual size of the resulting sample is approximately _r_% of _N_. -The number of rows picked follows a binomial distribution with -mean equal to _r_ *c_N_/100. -+ -If you specify a sample size greater than 100 PERCENT, {project-name} SQL -returns all the rows in the result table plus duplicate rows. The -duplicate rows are picked from the result table according to the -specified sampling method. This technique is called oversampling. - -** `ROWS` -+ -specifies row sampling. Row sampling is the default. - -** `BALANCE` -+ -If you specify a BALANCE expression, {project-name} SQL performs stratified -sampling. The intermediate result table is divided into disjoint strata -based on the WHEN conditions. -+ -Each stratum is sampled independently by using the sampling size. For a -given row, the stratum to which it belongs is determined by the first -WHEN condition that is true for that rowâif a true condition exists. If -no true condition exists, the row belongs to the ELSE stratum. - -* `FIRST _rows-size_ [SORT BY _colname_ [ASC[ENDING] | DESC[ENDING]] -[,_colname_ [ASC[ENDING] | DESC[ENDING]]]…]` -+ -directs {project-name} SQL to choose the first rows from the result table. -You can specify the order of the rows to sample. Otherwise, {project-name} -SQL chooses an arbitrary order. The sampling size is determined by the -_rows-size_, defined as: - -* `_number-rows_ ROWS | BALANCE WHEN _condition_ THEN _number-rows_ ROWS -[WHEN _condition_ THEN _number-rows_ ROWS]… [ELSE _number-rows_ ROWS] END` -+ -specifies the value of the size for FIRST sampling by using the number -of rows intended in the sample. The value _number-rows_ must be an -integer literal. -+ -You can determine the actual size of the sample. Suppose that _N_ rows -exist in the intermediate result table. If the size _s_ of the sample is -specified as a number of rows, the actual size of the resulting sample -is the minimum of _s_ and _N_. - -* `PERIODIC _rows-size_ EVERY _number-rows_ ROWS [SORT BY _colname_ -[ASC[ENDING] | DESC[ENDING]] [,_colname_ [ASC[ENDING] | -DESC[ENDING]]]…]` -+ -directs {project-name} SQL to choose the first rows from each block (or -period) of contiguous rows. This sampling method is equivalent to a -separate FIRST sampling for each period, and the _rows-size_ is defined -as in FIRST sampling. -+ -The size of the period is specified as a number of rows. You can specify -the order of the rows to sample. Otherwise, {project-name} SQL chooses an -arbitrary order. -+ -<<< -+ -You can determine the actual size of the sample. Suppose that _N_ rows -exist in the intermediate result table. If the size _s_ of the sample is -specified as a number of rows and the size _p_ of the period is -specified as a number of rows, the actual size of the resulting sample -is calculated as: -+ -``` -FLOOR (N/p) * s + _minimum_ (MOD (N, p), s) -``` -+ -_minimum_ in this expression is used simply as the mathematical -minimum of two values. - -[[considerations_for_sample]] -=== Considerations for SAMPLE - -[[sample_rows]] -==== Sample Rows - -In general, when you use the SAMPLE clause, the same query returns -different sets of rows for each execution. The same set of rows is -returned only when you use the FIRST and PERIODIC sampling methods with -the SORT BY option, where no duplicates exist in the specified column -combination for the sort. - -[[examples_of_sample]] -=== Examples of SAMPLE - -* Suppose that the data-mining tables SALESPER, SALES, and DEPT have been -created as: -+ -``` -CREATE TABLE trafodion.mining.salesper -( empid NUMERIC (4) UNSIGNED NOT NULL -, dnum NUMERIC (4) UNSIGNED NOT NULL -, salary NUMERIC (8,2) UNSIGNED -, age INTEGER -, sex CHAR (6) -, PRIMARY KEY (empid) ); - -CREATE TABLE trafodion.mining.sales -( empid NUMERIC (4) UNSIGNED NOT NULL -, product VARCHAR (20) -, region CHAR (4) -, amount NUMERIC (9,2) UNSIGNED -, PRIMARY KEY (empid) ); - -CREATE TABLE trafodion.mining.dept -( dnum NUMERIC (4) UNSIGNED NOT NULL -, name VARCHAR (20) -, PRIMARY KEY (dnum) ); -``` -+ -Suppose, too, that sample data is inserted into this database. - - -* Return the SALARY of the youngest 50 sales people: -+ -``` -SELECT salary -FROM salesperson -SAMPLE FIRST 50 ROWS -SORT BY age; - -SALARY ------------ - 90000.00 - 90000.00 - 28000.00 - 27000.12 - 136000.00 - 37000.40 -... - ---- 50 row(s) selected. -``` - -* Return the SALARY of 50 sales people. In this case, the table is -clustered on EMPID. If the optimizer chooses a plan to access rows using -the primary access path, the result consists of salaries of the 50 sales -people with the smallest employee identifiers. -+ -``` -SELECT salary -FROM salesperson -SAMPLE FIRST 50 ROWS; - -SALARY ------------ - 175500.00 - 137000.10 - 136000.00 - 138000.40 - 75000.00 - 90000.00 -... - ---- 50 row(s) selected. -``` - -<<< -* Return the SALARY of the youngest five sales people, skip the next 15 -rows, and repeat this process until no more rows exist in the -intermediate result table. You cannot specify periodic sampling with the -sample size larger than the period. -+ -``` -SELECT salary -FROM salesperson -SAMPLE PERIODIC 5 ROWS -EVERY 20 ROWS -SORT BY age; - -SALARY ------------ - 90000.00 - 90000.00 - 28000.00 - 27000.12 - 136000.00 - 36000.00 -... - ---- 17 row(s) selected. -``` -+ -In this example, 62 rows exist in the SALESPERSON table. For each set of -20 rows, the first five rows are selected. The last set consists of two -rows, both of which are selected. - -* Compute the average salary of a random 10 percent of the sales people. -You will get a different result each time you run this query because it -is based on a random sample. -+ -``` -SELECT AVG(salary) -FROM salesperson -SAMPLE RANDOM 10 PERCENT; - -(EXPR) --------------------- - 61928.57 - ---- 1 row(s) selected. -``` - -<<< -* This query illustrates sampling after execution of the WHERE clause -has chosen the qualifying rows. The query computes the average salary of -a random 10 percent of the sales people over 35 years of age. You will -get a different result each time you run this query because it -is based on a random sample. -+ -``` -SELECT AVG(salary) -FROM salesperson -WHERE age > 35 -SAMPLE RANDOM 10 PERCENT; - -(EXPR) --------------------- - 58000.00 - ---- 1 row(s) selected. -``` - -* Compute the average salary of a random 10 percent of sales people -belonging to the CORPORATE department. The sample is taken from the join -of the SALESPERSON and DEPARTMENT tables. You will get a different -result each time you run this query because it is based on a random -sample. -+ -``` -SELECT AVG(salary) -FROM salesperson S, department D -WHERE S.DNUM = D.DNUM AND D.NAME = 'CORPORATE' -SAMPLE RANDOM 10 PERCENT; - -(EXPR) ---------------------- - 106250.000 - ---- 1 row(s) selected. -``` - -<<< -* In this example, the SALESPERSON table is first sampled and then -joined with the DEPARTMENT table. This query computes the average salary -of all the sales people belonging to the CORPORATE department in a -random sample of 10 percent of the sales employees. -+ -``` -SELECT AVG(salary) -FROM - ( SELECT salary, dnum FROM salesperson SAMPLE RANDOM 10 PERCENT ) AS S - , department D -WHERE S.DNUM = D.DNUM - AND D.NAME = 'CORPORATE'; - -(EXPR) --------------------- - -37000.000 - ---- 1 row(s) selected. -``` -+ -The results of this query and some of the results of previous queries -might return null: -+ -``` -SELECT AVG(salary) -FROM - ( SELECT salary, dnum FROM salesperson SAMPLE RANDOM 10 PERCENT ) AS S - , department D -WHERE S.DNUM = D.DNUM AND D.NAME = 'CORPORATE'; - -(EXPR) --------------------- - -? - ---- 1 row(s) selected. -``` -+ -For this query execution, the number of rows returned by the embedded -query is limited by the total number of rows in the SALESPERSON table. -Therefore, it is possible that no rows satisfy the search condition in -the WHERE clause. - - -<<< -* In this example, both the tables are sampled first and then joined. -This query computes the average salary and the average sale amount -generated from a random 10 percent of all the sales people and 20 -percent of all the sales transactions. -+ -``` -SELECT AVG(salary), AVG(amount) -FROM ( SELECT salary, empid - FROM salesperson - SAMPLE RANDOM 10 PERCENT ) AS S, - ( SELECT amount, empid FROM sales - SAMPLE RANDOM 20 PERCENT ) AS T -WHERE S.empid = T.empid; - -(EXPR) (EXPR) ---------- --------- - 45000.00 31000.00 - ---- 1 row(s) selected. -``` - -* This example illustrates oversampling. This query retrieves 150 -percent of the sales transactions where the amount exceeds $1000. The -result contains every row at least once, and 50 percent of the rows, -picked randomly, occur twice. -+ -``` -SELECT * -FROM sales -WHERE amount > 1000 -SAMPLE RANDOM 150 PERCENT; - -EMPID PRODUCT REGION AMOUNT ------ -------------------- ------ ----------- - 1 PCGOLD, 30MB E 30000.00 - 23 PCDIAMOND, 60MB W 40000.00 - 23 PCDIAMOND, 60MB W 40000.00 - 29 GRAPHICPRINTER, M1 N 11000.00 - 32 GRAPHICPRINTER, M2 S 15000.00 - 32 GRAPHICPRINTER, M2 S 15000.00 - ... ... ... ... - ---- 88 row(s) selected. -``` - -<<< -* The BALANCE option enables stratified sampling. Retrieve the age and -salary of 1000 sales people such that 50 percent of the result are male -and 50 percent female. -+ -``` -SELECT age, sex, salary -FROM salesperson -SAMPLE FIRST -BALANCE - WHEN sex = 'male' THEN 15 ROWS - WHEN sex = 'female' THEN 15 ROWS - END -ORDER BY age; -+ -AGE SEX SALARY ------------ ------ ----------- - 22 male 28000.00 - 22 male 90000.00 - 22 female 136000.00 - 22 male 37000.40 - ... ... ... - ---- 30 row(s) selected. -``` - -* Retrieve all sales records with the amount exceeding $10000 and a -random sample of 10 percent of the remaining records: -+ -``` -SELECT * -FROM sales SAMPLE RANDOM -BALANCE - WHEN amount > 10000 - THEN 100 PERCENT - ELSE 10 PERCENT -END; - -PRODUCT REGION AMOUNT --------------------- ------ ----------- -PCGOLD, 30MB E 30000.00 -PCDIAMOND, 60MB W 40000.00 -GRAPHICPRINTER, M1 N 11000.00 -GRAPHICPRINTER, M2 S 15000.00 -... ... ... -MONITORCOLOR, M2 N 10500.00 -... ... ... - ---- 32 row(s) selected. -``` - -<<< -* This query shows an example of stratified sampling where the -conditions are not mutually exclusive: -+ -``` -SELECT * -FROM sales SAMPLE RANDOM -BALANCE - WHEN amount > 10000 THEN 100 PERCENT - WHEN product = 'PCGOLD, 30MB' THEN 25 PERCENT - WHEN region = 'W' THEN 40 PERCENT - ELSE 10 PERCENT END; - -PRODUCT REGION AMOUNT --------------------- ------ ----------- -PCGOLD, 30MB E 30000.00 -PCDIAMOND, 60MB W 40000.00 -GRAPHICPRINTER, M1 N 11000.00 -GRAPHICPRINTER, M2 S 15000.00 -GRAPHICPRINTER, M3 S 20000.00 -LASERPRINTER, X1 W 42000.00 -... ... ... - ---- 30 row(s) selected. -``` - -<<< -[[sequence_by_clause]] -== SEQUENCE BY Clause - -The SEQUENCE BY clause of the SELECT statement specifies the order in -which to sort the rows - -of the intermediate result table for calculating sequence functions. -This option is used for processing time-sequenced rows in data mining -applications. See <<select_statement>>. - -Sequence by is a {project-name} SQL extension. - -``` -SEQUENCE BY colname[ASC[ENDING]|DESC[ENDING]] - [,colname [ASC[ENDING] | DESC[ENDING]]]... -``` - -* `_colname_` -_ -names a column in _select-list_ or a column in a table reference in the -FROM clause of the SELECT statement. _colname_ is optionally qualified -by a table, view, or correlation name; for example, CUSTOMER.CITY. - -* `ASC | DESC` -+ -specifies the sort order. ASC is the default. For ordering an -intermediate result table on a column that can contain null, nulls are -considered equal to one another but greater than all other non-null -values. -+ -You must include a SEQUENCE BY clause if you include a sequence function -in the select list of the SELECT statement. Otherwise, {project-name} SQL -returns an error. Further, you cannot include a SEQUENCE BY clause if no -sequence function exists in the select list. See -<<sequence_functions,Sequence Functions>> . - -[[considerations_for_sequence_by]] -=== Considerations for SEQUENCE BY - -* Sequence functions behave differently from set (or aggregate) -functions and mathematical (or scalar) functions. -* If you include both SEQUENCE BY and GROUP BY clauses in the same -SELECT statement, the values of the sequence functions must be evaluated -first and then become input for aggregate functions in the statement. -** For a SELECT statement that contains both SEQUENCE BY and GROUP BY -clauses, you can nest the sequence function in the aggregate function: -+ -``` -SELECT - ordernum -, MAX(MOVINGSUM(qty_ordered, 3)) AS maxmovsum_qty -, AVG(unit_price) AS avg_price -FROM odetail -SEQUENCE BY partnum -GROUP BY ordernum; -``` - -* To use a sequence function as a grouping column, you must use a -derived table for the SEQUENCE BY query and use the derived column in -the GROUP BY clause: -+ -``` -SELECT - ordernum -, movsum_qty -, AVG(unit_price) -FROM - ( SELECT ordernum, MOVINGSUM(qty_ordered, 3), unit_price - FROM odetail SEQUENCE BY partnum ) - AS tab2 (ordernum, movsum_qty, unit_price) -GROUP BY ordernum, movsum_qty; -``` - -* To use an aggregate function as the argument to a sequence function, -you must also use a derived table: -+ -``` -SELECT MOVINGSUM(avg_price,2) -FROM - ( SELECT ordernum, AVG(unit_price) FROM odetail - GROUP BY ordernum) -AS tab2 (ordernum, avg_price) -SEQUENCE BY ordernum; -``` - -* Like aggregate functions, sequence functions generate an intermediate -result. If the query has a WHERE clause, its search condition is applied -during the generation of the intermediate result. Therefore, you cannot -use sequence functions in the WHERE clause of a SELECT statement. - -** This query returns an error: -+ -``` -SELECT ordernum, partnum, RUNNINGAVG(unit_price) -FROM odetail -WHERE ordernum > 800000 AND RUNNINGAVG(unit_price) > 350 -SEQUENCE BY qty_ordered; -``` - -** Apply a search condition to the result of a sequence function, use a -derived table for the SEQUENCE BY query, and use the derived column in -the WHERE clause: -+ -``` -SELECT ordernum, partnum, runavg_price -FROM - ( SELECT ordernum, partnum, RUNNINGAVG(unit_price) - FROM odetail SEQUENCE BY qty_ordered) -AS tab2 (ordernum, partnum, runavg_price) -WHERE ordernum > 800000 AND -runavg_price > 350; -``` - -[[examples_of_sequence_by]] -=== Examples of SEQUENCE BY - -* Sequentially number each row for the entire result and also number the -rows for each part number: -+ -``` -SELECT - RUNNINGCOUNT(*) AS RCOUNT -, MOVINGCOUNT(*,ROWS SINCE (d.partnum<>THIS(d.partnum))) AS MCOUNT -, d.partnum -FROM orders o, odetail d -WHERE o.ordernum=d.ordernum -SEQUENCE BY d.partnum, o.order_date, o.ordernum -ORDER BY d.partnum, o.order_date, o.ordernum; - -RCOUNT MCOUNT Part/Num --------------------- --------------------- -------- - 1 1 212 - 2 2 212 - 3 1 244 - 4 2 244 - 5 3 244 - ... ... ... - 67 1 7301 - 68 2 7301 - 69 3 7301 - 70 4 7301 - ---- 70 row(s) selected. -``` - -<<< -* Show the orders for each date, the amount for each order item and the -moving total for each order, and the running total of all the orders. -The query sequences orders by date, order number, and part number. (The -CAST function is used for readability only.) -+ -``` -SELECT - o.ordernum -, CAST (MOVINGCOUNT(*,ROWS SINCE(THIS(o.ordernum) <> o.ordernum)) AS INT) AS MCOUNT -, d.partnum -, o.order_date -, (d.unit_price * d.qty_ordered) AS AMOUNT -, MOVINGSUM (d.unit_price * d.qty_ordered, SEQUENCE BY Clause 269 ROWS SINCE(THIS(o.ordernum)<>o.ordernum) ) AS ORDER_TOTAL -, RUNNINGSUM (d.unit_price * d.qty_ordered) AS TOTAL_SALES -FROM orders o, odetail d -WHERE o.ordernum=d.ordernum -SEQUENCE BY o.order_date, o.ordernum, d.partnum -ORDER BY o.order_date, o.ordernum, d.partnum; - -Order/Num MCOUNT Part/Num Order/Date AMOUNT ORDER_TOTAL TOTAL_SALES ----------- ----------- -------- ---------- ---------- -------------- -------------- - 100250 1 244 2008-01-23 14000.00 14000.00 14000.00 - 100250 2 5103 2008-01-23 4000.00 18000.00 18000.00 - 100250 3 6500 2008-01-23 950.00 18950.00 18950.00 - 200300 1 244 2008-02-06 28000.00 28000.00 46950.00 - 200300 2 2001 2008-02-06 10000.00 38000.00 56950.00 - 200300 3 2002 2008-02-06 14000.00 52000.00 70950.00 - ... ... ... ... ... ... ... - 800660 18 7102 2008-10-09 1650.00 187360.00 113295.00 - 800660 19 7301 2008-10-09 5100.00 192460.00 1118395.00 - ---- 69 row(s) selected. -``` -+ -For example, for order number 200300, the ORDER_TOTAL is a moving sum -within the order date 2008-02-06, and the TOTAL_SALES is a running sum -for all orders. The current window for the moving sum is defined as ROWS -SINCE (THIS(o.ordernum)<>o.ordernum), which restricts the ORDER_TOTAL to -the current order number. - -<<< -* Show the amount of time between orders by calculating the interval between two dates: -+ -``` -SELECT RUNNINGCOUNT(*),o.order_date,DIFF1(o.order_date) -FROM orders o -SEQUENCE BY o.order_date, o.ordernum -ORDER BY o.order_date, o.ordernum ; - - -(EXPR) Order/Date (EXPR) --------------------- ---------- ------------- - 1 2008-01-23 ? - 2 2008-02-06 14 - 3 2008-02-17 11 - 4 2008-03-03 14 - 5 2008-03-19 16 - 6 2008-03-19 0 - 7 2008-03-27 8 - 8 2008-04-10 14 - 9 2008-04-20 10 - 10 2008-05-12 22 - 11 2008-06-01 20 - 12 2008-07-21 50 - 13 2008-10-09 80 - ---- 13 row(s) selected. -``` - -<<< -[[transpose_clause]] -== TRANSPOSE Clause - -The TRANSPOSE clause of the SELECT statement generates for each row of -the SELECT source table a row for each item in the transpose item list. -The result table of the TRANSPOSE clause has all the columns of the -source table plus, for each transpose item list, a value column or -columns and an optional key column. - -TRANSPOSE is a {project-name} SQL extension. - -``` -TRANSPOSE transpose-set [transpose-set]... - [KEY BY key-colname] - -transpose-set is: - transpose-item-list AS transpose-col-list - -transpose-item-list is: - expression-list -| (expression-list) [,(expression-list)]... - -expression-list is: - expression [,expression]... - -transpose-col-list is: - colname | (colname-list) - -colname-list is: - colname [,colname]... -``` - -* `_transpose-item-list_ AS _transpose-col-list_` -+ -specifies a _transpose-set_, which correlates a _transpose-item-list_ -with a _transpose-col-list_. The _transpose-item-list_ can be a list -of expressions or a list of expression lists enclosed in parentheses. -The _transpose-col-list_ can be a single column name or a list of column -names enclosed in parentheses. -+ -For example, in the _transpose-set_ TRANSPOSE (A,X),(B,Y),(C,Z) AS -(V1,V2), the items in the _transpose-item-list_ are (A,X),(B,Y), and -(C,Z), and the _transpose-col-list_ is (V1,V2). The number of -expressions in each item must be the same as the number of value columns -in the column list. -+ -In the example TRANSPOSE A,B,C AS V, the items are A,B, and C, and the -value column is V. This form can be thought of as a shorter way of writing TRANSPOSE -(A),(B),(C) AS (V). - -* `_transpose-item-list_` -+ -specifies a list of items. An item is a value expression or a list of -value expressions enclosed in parentheses. - -** `_expression-list_` -+ -specifies a list of SQL value expressions, separated by commas. The -expressions must have compatible data types. -+ -For example, in the transpose set TRANSPOSE A,B,C AS V, the expressions -A,B, and C have compatible data types. - -** `(_expression-list_) [,(_expression-list_)]&8230;` -+ -specifies a list of expressions enclosed in parentheses, followed by -another list of expressions enclosed in parentheses, and so on. The -number of expressions within parentheses must be equal for each list. -The expressions in the same ordinal position within the parentheses must -have compatible data types. -+ -For example, in the transpose set TRANSPOSE (A,X),(B,Y),(C,Z) AS -(V1,V2), the expressions A,B, and C have compatible data types, and the -expressions X,Y, and Z have compatible data types. - -* `_transpose-col-list_` -+ -specifies the columns that consist of the evaluation of expressions in -the item list as the expressions are applied to rows of the source -table. - -** `_colname_` -+ -is an SQL identifier that specifies a column name. It identifies the -column consisting of the values in _expression-list_. -+ -For example, in the transpose set TRANSPOSE A,B,C AS V, the column V -corresponds to the values of the expressions A,B, and C. - -** `(_colname-list_)` -+ -specifies a list of column names enclosed in parentheses. Each column -consists of the values of the expressions in the same ordinal position -within the parentheses in the transpose item list. -+ -For example, in the transpose set TRANSPOSE (A,X),(B,Y),(C,Z) AS -(V1,V2), the column V1 corresponds to the expressions A,B, and C, and -the column V2 corresponds to the expressions X,Y, and Z. - -* `KEY BY _key-colname_` -+ -optionally specifies which expression (the value in the transpose column -list corresponds to) by its position in the item list. _key-colname_ is -an SQL identifier. The data type of the key column is exact numeric, and -the value is NOT NULL. - -[[considerations_for_transpose]] -=== Considerations for TRANSPOSE - -[[multiple_transpose_clauses_and_sets]] -==== Multiple TRANSPOSE Clauses and Sets - -* Multiple TRANSPOSE clauses can be used in the same query. For example: -+ -``` -SELECT keycol1, valcol1, keycol2, valcol2 -FROM mytable -TRANSPOSE a, b, c AS valcol1 KEY BY keycol1 -TRANSPOSE d, e, f AS valcol2 KEY BY keycol2 -``` - -* A TRANSPOSE clause can contain multiple transpose sets. For example: -+ -``` -SELECT keycol, valcol1, valcol2 -FROM mytable -TRANSPOSE a, b, c AS valcol1 - d, e, f AS valcol2 -KEY BY keycol -``` - -[[degree_and_column_order_of_the_transpose_result]] -==== Degree and Column Order of the TRANSPOSE Result - -The degree of the TRANSPOSE result is the degree of the source table -(the result table derived from the table reference or references in the -FROM clause and a WHERE clause if specified), plus one if the key column -is specified, plus the cardinalities of all the transpose column lists. - -The columns of the TRANSPOSE result are ordered beginning with the -columns of the source table, followed by the key column if specified, -and then followed by the list of column names in the order in which they -are specified. - -[[data_type_of_the_transpose_result]] -==== Data Type of the TRANSPOSE Result - -The data type of each of the value columns is the union compatible data -type of the corresponding expressions in the _transpose-item-list_. -You cannot have expressions with data types that are not compatible in a -_transpose-item-list_. - -For example, in TRANSPOSE (A,X),(B,Y),(C,Z) AS (V1,V2), the data type of -V1 is the union compatible type for A, B, and C, and the data type of V2 -is the union compatible type for X, Y, and Z. - -See <<comparable_and_compatible_data_types,Comparable and Compatible Data Types>>. - -[[cardinality_of_the_transpose_result]] -==== Cardinality of the TRANSPOSE Result - -The items in each _transpose-item-list_ are enumerated from 1 to N, -where N is the total number of items in all the item lists in the -transpose sets. - -In this example with a single transpose set, the value of N is 3: - -``` -TRANSPOSE (a,x),(b,y),(c,z) AS (v1,v2) -``` - -In this example with two transpose sets, the value of N is 5: - -``` -TRANSPOSE (a,x),(b,y),(c,z) AS (v1,v2) l,m AS v3 -``` - -The values 1 to N are the key values _k_i. The items in each -_transpose-item-list_ are the expression values _v_i. - -The cardinality of the result of the TRANSPOSE clause is the cardinality -of the source table times N, the total number of items in all the -transpose item lists. - -For each row of the source table and for each value in the key values -_k_i, the TRANSPOSE result contains a row with all the attributes of -the source table, the key value _k_i in the key column, the expression -values vi in the value columns of the corresponding transpose set, and -NULL in the value columns of other transpose sets. - -For example, consider this TRANSPOSE clause: - -``` -TRANSPOSE (a,x),(b,y),(c,z) AS (v1,v2) - l,m AS v3 -KEY BY k -``` - -The value of N is 5. One row of the SELECT source table produces this -TRANSPOSE result: - -[cols="5*",options="header"] -|=== -| _columns-of-source_ | K | V1 | V2 | V3 -| _source-row_ | 1 | _value-of-A_ | _value-of-X_ | NULL -| _source-row_ | 2 | _value-of-B_ | _value-of-Y_ | NULL -| _source-row_ | 3 | _value-of-C_ | _value-of-Z_ | NULL -| _source-row_ | 4 | NULL | NULL | _value-of-L_ -| _source-row_ | 5 | NULL | NULL | _value-of-M_ -|=== - -<<< -[[examples_of_transpose]] -=== Examples of TRANSPOSE - -* Suppose that MYTABLE has been created as: -+ -``` -CREATE TABLE mining.mytable -( A INTEGER, B INTEGER, C INTEGER, D CHAR(2), E CHAR(2), F CHAR(2) ); -``` -+ -The table MYTABLE has columns A, B, C, D, E, and F with related data. -The columns A, B, and C are type INTEGER, and columns D, E, and F are -type CHAR. -+ -[cols="6*",options="header"] -|==== -| A | B | C | D | E | F -| 1 | 10 | 100 | d1 | e1 | f1 -| 2 | 20 | 200 | d2 | e2 | f2 -|==== - -* Suppose that MYTABLE has only the first three columns: A, B, and C. -The result of the TRANSPOSE clause has three times as many rows (because -three items exist in the transpose item list) as rows exist in MYTABLE: -+ -``` -SELECT * FROM mytable -TRANSPOSE a, b, c AS valcol KEY BY keycol; -``` -+ -The result table of the TRANSPOSE query is: -+ -[cols="8*",options="header"] -|=== -| A | B | C | D | E | F | KEYCOL | VALCOL -| 1 | 10 | 100 | d1 | e1 | f1 | 1 | 1 -| 1 | 10 | 100 | d1 | e1 | f1 | 2 | 10 -| 1 | 10 | 100 | d1 | e1 | f1 | 3 | 100 -| 2 | 20 | 200 | d2 | e2 | f2 | 1 | 2 -| 2 | 20 | 200 | d2 | e2 | f2 | 2 | 20 -| 2 | 20 | 200 | d2 | e2 | f2 | 3 | 200 -|=== - -<<< -* This query shows that the items in the transpose item list can be any -valid scalar expressions: -+ -``` -SELECT keycol, valcol, a, b, c FROM mytable -TRANSPOSE a + b, c + 3, 6 AS valcol KEY BY keycol; -``` -+ -The result table of the TRANSPOSE query is: -+ -[cols="5*",options="header"] -|===== -| KEYCOL | VALCOL | A | B | C -| 1 | 1 | 1 | 10 | 100 -| 2 | 103 | 1 | 10 | 100 -| 3 | 6 | 1 | 10 | 100 -| 1 | 22 | 2 | 20 | 200 -| 2 | 203 | 2 | 20 | 200 -| 3 | 6 | 2 | 20 | 200 -|===== - -* This query shows how the TRANSPOSE clause can be used with a GROUP BY -clause. This query is typical of queries used to obtain cross-table -information, where A, B, and C are the independent variables, and D is -the dependent variable. -+ -``` -SELECT keycol, valcol, d, COUNT(*) -FROM mytable -TRANSPOSE a, b, c AS valcol -KEY BY keycol -GROUP BY keycol, valcol, d; -``` -+ -The result table of the TRANSPOSE query is: -+ -[cols="4*",options="header"] -|=== -| KEYCOL | VALCOL | D | COUNT(*) -| 1 | 1 | d1 | 1 -| 2 | 10 | d1 | 1 -| 3 | 100 | d1 | 1 -| 1 | 2 | d2 | 1 -| 2 | 20 | d2 | 1 -| 3 | 200 | d2 | 1 -|=== - -<<< -* This query shows how to use COUNT applied to VALCOL. The result table -of the TRANSPOSE query shows the number of distinct values in VALCOL. -+ -``` -SELECT COUNT(DISTINCT valcol) FROM mytable -TRANSPOSE a, b, c AS valcol KEY BY keycol -GROUP BY keycol; - -(EXPR) --------------------- - 2 - 2 - 2 - ---- 3 row(s) selected. -``` - -* This query shows how multiple TRANSPOSE clauses can be used in the -same query. The result table from this query has nine times as many rows -as rows exist in MYTABLE: -+ -``` -SELECT keycol1, valcol1, keycol2, valcol2 FROM mytable -TRANSPOSE a, b, c AS valcol1 KEY BY keycol1 -TRANSPOSE d, e, f AS valcol2 KEY BY keycol2; -``` -+ -The result table of the TRANSPOSE query is: -+ -[cols=",,,",options="header"] -|=== -| KEYCOL1 | VALCOL1 | KEYCOL2 | VALCOL2 -| 1 | 1 | 1 | d1 -| 1 | 1 | 2 | e1 -| 1 | 1 | 3 | f1 -| 2 | 10 | 1 | d1 -| 2 | 10 | 2 | e1 -| 2 | 10 | 3 | f1 -| 3 | 100 | 1 | d1 -| 3 | 100 | 2 | e1 -| 3 | 100 | 3 | f1 -| 1 | 2 | 1 | d2 -| 1 | 2 | 2 | e2 -| 1 | 2 | 3 | f2 -| 2 | 20 | 1 | d2 -| 2 | 20 | 2 | e2 -| 2 | 20 | 3 | f2 -| 3 | 200 | 1 | d2 -| 3 | 200 | 2 | e2 -| 3 | 200 | 3 | f2 -|=== - -* This query shows how a TRANSPOSE clause can contain multiple transpose -setsâthat is, multiple _transpose-item-list_ AS _transpose-col-list_. -The expressions A, B, and C are of type integer, and expressions D, E, -and F are of type character. -+ -``` -SELECT keycol, valcol1, valcol2 -FROM mytable -TRANSPOSE a, b, c AS valcol1 - d, e, f AS valcol2 -KEY BY keycol; -``` -+ -The result table of the TRANSPOSE query is: -+ -[cols="3*",options="header"] -|=== -| KEYCOL | VALCOL1 | VALCOL2 -| 1 | 1 | ? -| 2 | 10 | ? -| 3 | 100 | ? -| 4 | ? | d1 -| 5 | ? | e1 -| 6 | ? | f1 -| 1 | 2 | ? -| 2 | 20 | ? -| 3 | 200 | ? -| 4 | ? | d2 -| 5 | ? | e2 -| 6 | ? | f2 -|=== -+ -A question mark (?) in a value column indicates no value for the given KEYCOL. - -* This query shows how the preceding query can include a GROUP BY clause: -+ -``` -SELECT keycol, valcol1, valcol2, COUNT(*) -FROM mytable -TRANSPOSE a, b, c AS valcol1 - d, e, f AS valcol2 -KEY BY keycol -GROUP BY keycol, valcol1, valcol2; -``` -+ -The result table of the TRANSPOSE query is: -+ -[cols="4*",options="header"] -|=== -| KEYCOL | VALCOL1 | VALCOL2 | (EXPR) -| 1 | 1 | ? | 1 -| 2 | 10 | ? | 1 -| 3 | 100 | ? | 1 -| 1 | 2 | ? | 1 -| 2 | 20 | ? | 1 -| 3 | 200 | ? | 1 -| 4 | ? | d2 | 1 -| 5 | ? | e2 | 1 -| 6 | ? | f2 | 1 -| 4 | ? | d1 | 1 -| 5 | ? | e1 | 1 -| 6 | ? | f1 | 1 -|=== - -* This query shows how an item in the transpose item list can contain a -list of expressions and that the KEY BY clause is optional: -+ -``` -SELECT * FROM mytable -TRANSPOSE (1, A, 'abc'), (2, B, 'xyz') AS (VALCOL1, VALCOL2, VALCOL3); -``` -+ -The result table of the TRANSPOSE query is: -+ -[cols="9*",options="header"] -|=== -| A | B | C | D | E | F | VALCOL1 | VALCOL2 | VALCOL3 -| 1 | 10 | 100 | d1 | e1 | f1 | 1 | 1 | abc -| 1 | 10 | 100 | d1 | e1 | f1 | 2 | 10 | xyz -| 2 | 20 | 200 | d2 | e2 | f2 | 1 | 2 | abc -| 2 | 20 | 200 | d2 | e2 | f2 | 2 | 20 | xyz -|=== +//// +/** +* @@@ START COPYRIGHT @@@ +* +* Licensed to the Apache Software Foundation (ASF) under one +* or more contributor license agreements. See the NOTICE file +* distributed with this work for additional information +* regarding copyright ownership. The ASF licenses this file +* to you under the Apache License, Version 2.0 (the +* "License"); you may not use this file except in compliance +* with the License. You may obtain a copy of the License at +* +* http://www.apache.org/licenses/LICENSE-2.0 +* +* Unless required by applicable law or agreed to in writing, +* software distributed under the License is distributed on an +* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +* KIND, either express or implied. See the License for the +* specific language governing permissions and limitations +* under the License. +* +* @@@ END COPYRIGHT @@@ +*/ +//// + +[[sql_clauses]] += SQL Clauses + +Clauses are used by {project-name} SQL statements to specify default values, +ways to sample or sort data, how to store physical data, and other +details. + +This section describes: + +* <<default_clause,DEFAULT Clause>> specifies a default value for a column being created. +* <<format_clause,FORMAT Clause>> specifies the format to use. +* <<sample_clause,SAMPLE Clause>> specifies the sampling method used to select a subset of the intermediate result table of a SELECT statement. +* <<sequence_by_clause,SEQUENCE BY Clause>> specifies the order in which to sort rows of the intermediate result table for calculating sequence functions. +* <<transpose_clause,TRANSPOSE Clause>> generates, for each row of the SELECT source table, a row for each item in the transpose item list. + +[[default_clause]] +== DEFAULT Clause + +The DEFAULT option of the CREATE TABLE or ALTER TABLE _table-name_ ADD +COLUMN statement specifies a default value for a column being created. + +The default value is used when a row is inserted in the table without a value for the column. + +``` +DEFAULT default | NO DEFAULT + +default is: + literal +| NULL +| CURRENTDATE +| CURRENTTIME +| CURRENTTIMESTAMP +``` + +* `NO DEFAULT` ++ +specifies the column has no default value. You cannot specify NO DEFAULT +in an ALTER TABLE statement. See <<alter_table_statement,ALTER TABLE Statement>>. + +[[syntax_for_default_clause]] +=== Syntax for Default Clause + +* `DEFAULT _literal_` ++ +is a literal of a data type compatible with the data type of the +associated column. ++ +For a character column, _literal_ must be a string literal of no more +than 240 characters or the length of the column, whichever is less. The +maximum length of a default value for a character column is 240 bytes +(minus control characters) or the length of the column, whichever is +less. Control characters consist of character set prefixes and single +quote delimiter found in the text itself. ++ +For a numeric column, _literal_ must be a numeric literal that does not +exceed the defined length of the column. The number of digits to the +right of the decimal point must not exceed the scale of the column, and +the number of digits to the left of the decimal point must not exceed +the number in the length (or length minus scale, if you specified scale +for the column). ++ +For a datetime column, _literal_ must be a datetime literal with a +precision that matches the precision of the column. ++ +For an INTERVAL column, _literal_ must be an INTERVAL literal that has +the range of INTERVAL fields defined for the column. + +* `DEFAULT NULL` ++ +specifies NULL as the default. This default can occur only with a column +that allows null. + +* `DEFAULT CURRENT_DATE` ++ +specifies the default value for the column as the value returned by the +CURRENT_DATE function at the time of the operation that assigns a value +to the column. This default can occur only with a column whose data type +is DATE. + +* `DEFAULT CURRENT_TIME` ++ +specifies the default value for the column as the value returned by the +CURRENT_TIME function at the time of the operation that assigns a value +to the column. This default can occur only with a column whose data type +is TIME. + +* `DEFAULT CURRENT_TIMESTAMP` ++ +specifies the default value for the column as the value returned by the +CURRENT_TIMESTAMP function at the time of the operation that assigns a +value to the column. This default can occur only with a column whose +data type is TIMESTAMP. + +[[examples_of_default]] +=== Examples of DEFAULT + +* This example uses DEFAULT clauses on CREATE TABLE to specify default column values: ++ +``` +CREATE TABLE items +( item_id CHAR(12) NO DEFAULT +, description CHAR(50) DEFAULT NULL +, num_on_hand INTEGER DEFAULT 0 NOT NULL +) ; +``` + +* This example uses DEFAULT clauses on CREATE TABLE to specify default column values: ++ +``` +CREATE TABLE persnl.project +( projcode NUMERIC (4) UNSIGNED NO DEFAULT NOT NULL +, empnum NUMERIC (4) UNSIGNED NO DEFAULT NOT NULL +, projdesc VARCHAR (18) DEFAULT NULL +, start_date DATE DEFAULT CURRENT_DATE +, ship_timestamp TIMESTAMP DEFAULT CURRENT_TIMESTAMP +, est_complete INTERVAL DAY DEFAULT INTERVAL '30' DAY +, PRIMARY KEY (projcode) +) ; +``` + +<<< +[[format_clause]] +== FORMAT Clause + +The FORMAT clause specifies the output format for DATE values. It can +also be used to specify the length of character output or to specify +separating the digits of integer output with colons. + +* Date Formats: ++ +``` +(FORMAT 'format-string') | + +(DATE, FORMAT 'format-string') + +format-string for Date Formats is: + YYYY-MM-DD + MM/DD/YYYY + YY/MM/DD + YYYY/MM/DD + YYYYMMDD + DD.MM.YYYY + DD-MM-YYYY + DD-MMM-YYYY +``` + +* Other Formats: ++ +``` +(FORMAT 'format-string') + +format-string for other formats is: + XXX + 99:99:99:99 + -99:99:99:99 +``` + +* `YYYY-MM-DD` ++ +specifies that the FORMAT clause output format is _year-month-day_. + +* `MM/DD/YYYY` ++ +specifies that the FORMAT clause output format is _month/day/year_ + +* `YY/MM/DD` ++ +specifies that the FORMAT clause output format is _year/month/day_. + +* `YYYY/MM/DD` ++ +specifies that the FORMAT clause output format is _year/month/day_. + +* `YYYYMMDD` ++ +specifies that the FORMAT clause output format is _yearmonthday_. + +* `DD.MM.YYYY` ++ +specifies that the FORMAT clause output format is _day.month.year_. + +* `DD-MM-YYYY` ++ +specifies that the FORMAT clause output format is _day-month-year_. + +* `DD-MMM-YYYY` ++ +specifies that the FORMAT clause output format is _day-month-year_. + +* `XXX` ++ +specifies that the FORMAT clause output format is a string format. The +input must be a numeric or string value. + +* `99:99:99:99` ++ +specifies that the FORMAT clause output format is a timestamp. The input +must be a numeric value. + +* `-99:99:99:99` ++ +specifies that the FORMAT clause output format is a timestamp. The input +must be a numeric value. + +[[considerations_for_date_formats]] +=== Considerations for Date Formats + +The expression preceding the (FORMAT â_format-string_') clause must be +a DATE value. + +The expression preceding the (DATE, FORMAT _'format-string_') clause +must be a quoted string in the USA, EUROPEAN, or DEFAULT date format. + +[[considerations_for_other_formats]] +==== Considerations for Other Formats + +For XXX, the expression preceding the (FORMAT _'format-string_') +clause must be a numeric value or a string value. + +For 99:99:99:99 and -99:99:99:99, the expression preceding the (FORMAT +_'format-string_') clause must be a numeric value. + +[[examples_of_format]] +=== Examples of FORMAT + +* The format string 'XXX' in this example will yield a sample result of abc: ++ +``` +SELECT 'abcde' (FORMAT 'XXX') FROM (VALUES(1)) t; +``` + +* The format string 'YYYY-MM_DD' in this example will yield a sample result of 2008-07-17. ++ +``` +SELECT CAST('2008-07-17' AS DATE) (FORMAT 'YYYY-MM-DD') FROM (VALUES(1)) t; +``` + +* The format string 'MM/DD/YYYY' in this example will yield a sample result of 07/17/2008. ++ +``` +SELECT '2008-07-17' (DATE, FORMAT 'MM/DD/YYYY') FROM (VALUES(1)) t; +``` + +* The format string 'YY/MM/DD' in this example will yield a sample result of 08/07/17. ++ +``` +SELECT '2008-07-17'(DATE, FORMAT 'YY/MM/DD') FROM (VALUES(1)) t; +``` + +* The format string 'YYYY/MM/DD' in this example will yield a sample result of 2008/07/17. ++ +``` +SELECT '2008-07-17' (DATE, FORMAT 'YYYY/MM/DD') FROM (VALUES(1)) t; +``` + +* The format string 'YYYYMMDD' in this example will yield a sample result`of 20080717. ++ +``` +SELECT '2008-07-17' (DATE, FORMAT 'YYYYMMDD') FROM (VALUES(1)) t; +``` + +* The format string 'DD.MM.YYYY' in this example will yield a sample result of 17.07.2008. ++ +``` +SELECT '2008-07-17' (DATE, FORMAT 'DD.MM.YYYY') FROM (VALUES(1)) t; +``` + +* The format string 'DD-MMM-YYYY' in this example will yield a sample result of 17âJUL-2008. ++ +``` +SELECT '2008-07-17' (DATE, FORMAT 'DD-MMM-YYYY') FROM (VALUES(1)) t; +``` + +* The format string '99:99:99:99' in this example will yield a sample result of 12:34:56:78. ++ +``` +SELECT 12345678 (FORMAT '99:99:99:99') FROM (VALUES(1)) t; +``` + +* The format string '-99:99:99:99' in this example will yield a sample result of -12:34:56:78. ++ +``` +SELECT (-12345678) (FORMAT '-99:99:99:99') FROM (VALUES(1)) t; +``` + +<<< +[[sample_clause]] +== SAMPLE Clause + +The SAMPLE clause of the SELECT statement specifies the sampling method +used to select a subset of the intermediate result table of a SELECT +statement. The intermediate result table consists of the rows returned +by a WHERE clause or, if no WHERE clause exists, the FROM clause. See +<<select_statement,SELECT Statement>>. + +SAMPLE is a {project-name} SQL extension. + +``` +SAMPLE sampling-methodis: + RANDOM percent-size +| FIRST rows-size + [SORT BY colname [ASC[ENDING]|DESC[ENDING]] + [,colname [ASC[ENDING] | DESC[ENDING]]]...] +| PERIODIC rows-size EVERY number-rows ROWS + [SORT BY colname [ASC[ENDING] | DESC[ENDING]] + [,colname [ASC[ENDING] | DESC[ENDING]]]...] + +percent-size is: + percent-result PERCENT [ROWS] +| BALANCE WHEN condition + THEN percent-result PERCENT [ROWS] + [WHEN condition THEN percent-result PERCENT [ROWS]]... + [ELSE percent-result PERCENT [ROWS]] END + +rows-size is: + number-rows ROWS +| BALANCE WHEN condition THEN number-rows ROWS + [WHEN condition THEN number-rows ROWS]... + [ELSE number-rows ROWS] END +``` + +* `RANDOM _percent-size_` ++ +directs {project-name} SQL to choose rows randomly (each row having an +unbiased probability of being chosen) without replacement from the +result table. The sampling size is determined by the _percent-size_, +defined as: + +* `_percent-result_ PERCENT [ROWS] | BALANCE WHEN _condition_ THEN +_percent-result_ PERCENT [ROWS] [WHEN _condition_ THEN _percent-result_ +PERCENT [ROWS]]… [ELSE _percent-result_ PERCENT [ROWS]] END` ++ +specifies the value of the size for RANDOM sampling by using a percent +of the result table. The value _percent-result_ must be a numeric +literal. ++ +You can determine the actual size of the sample. Suppose that _N_ rows +exist in the intermediate result table. Each row is picked with a +probability of _r_%, where _r_ is the sample size in PERCENT. +Therefore, the actual size of the resulting sample is approximately _r_% of _N_. +The number of rows picked follows a binomial distribution with +mean equal to _r_ *c_N_/100. ++ +If you specify a sample size greater than 100 PERCENT, {project-name} SQL +returns all the rows in the result table plus duplicate rows. The +duplicate rows are picked from the result table according to the +specified sampling method. This technique is called oversampling. + +** `ROWS` ++ +specifies row sampling. Row sampling is the default. + +** `BALANCE` ++ +If you specify a BALANCE expression, {project-name} SQL performs stratified +sampling. The intermediate result table is divided into disjoint strata +based on the WHEN conditions. ++ +Each stratum is sampled independently by using the sampling size. For a +given row, the stratum to which it belongs is determined by the first +WHEN condition that is true for that rowâif a true condition exists. If +no true condition exists, the row belongs to the ELSE stratum. + +* `FIRST _rows-size_ [SORT BY _colname_ [ASC[ENDING] | DESC[ENDING]] +[,_colname_ [ASC[ENDING] | DESC[ENDING]]]…]` ++ +directs {project-name} SQL to choose the first rows from the result table. +You can specify the order of the rows to sample. Otherwise, {project-name} +SQL chooses an arbitrary order. The sampling size is determined by the +_rows-size_, defined as: + +* `_number-rows_ ROWS | BALANCE WHEN _condition_ THEN _number-rows_ ROWS +[WHEN _condition_ THEN _number-rows_ ROWS]… [ELSE _number-rows_ ROWS] END` ++ +specifies the value of the size for FIRST sampling by using the number +of rows intended in the sample. The value _number-rows_ must be an +integer literal. ++ +You can determine the actual size of the sample. Suppose that _N_ rows +exist in the intermediate result table. If the size _s_ of the sample is +specified as a number of rows, the actual size of the resulting sample +is the minimum of _s_ and _N_. + +* `PERIODIC _rows-size_ EVERY _number-rows_ ROWS [SORT BY _colname_ +[ASC[ENDING] | DESC[ENDING]] [,_colname_ [ASC[ENDING] | +DESC[ENDING]]]…]` ++ +directs {project-name} SQL to choose the first rows from each block (or +period) of contiguous rows. This sampling method is equivalent to a +separate FIRST sampling for each period, and the _rows-size_ is defined +as in FIRST sampling. ++ +The size of the period is specified as a number of rows. You can specify +the order of the rows to sample. Otherwise, {project-name} SQL chooses an +arbitrary order. ++ +<<< ++ +You can determine the actual size of the sample. Suppose that _N_ rows +exist in the intermediate result table. If the size _s_ of the sample is +specified as a number of rows and the size _p_ of the period is +specified as a number of rows, the actual size of the resulting sample +is calculated as: ++ +``` +FLOOR (N/p) * s + _minimum_ (MOD (N, p), s) +``` ++ +_minimum_ in this expression is used simply as the mathematical +minimum of two values. + +[[considerations_for_sample]] +=== Considerations for SAMPLE + +[[sample_rows]] +==== Sample Rows + +In general, when you use the SAMPLE clause, the same query returns +different sets of rows for each execution. The same set of rows is +returned only when you use the FIRST and PERIODIC sampling methods with +the SORT BY option, where no duplicates exist in the specified column +combination for the sort. + +[[examples_of_sample]] +=== Examples of SAMPLE + +* Suppose that the data-mining tables SALESPER, SALES, and DEPT have been +created as: ++ +``` +CREATE TABLE trafodion.mining.salesper +( empid NUMERIC (4) UNSIGNED NOT NULL +, dnum NUMERIC (4) UNSIGNED NOT NULL +, salary NUMERIC (8,2) UNSIGNED +, age INTEGER +, sex CHAR (6) +, PRIMARY KEY (empid) ); + +CREATE TABLE trafodion.mining.sales +( empid NUMERIC (4) UNSIGNED NOT NULL +, product VARCHAR (20) +, region CHAR (4) +, amount NUMERIC (9,2) UNSIGNED +, PRIMARY KEY (empid) ); + +CREATE TABLE trafodion.mining.dept +( dnum NUMERIC (4) UNSIGNED NOT NULL +, name VARCHAR (20) +, PRIMARY KEY (dnum) ); +``` ++ +Suppose, too, that sample data is inserted into this database. + + +* Return the SALARY of the youngest 50 sales people: ++ +``` +SELECT salary +FROM salesperson +SAMPLE FIRST 50 ROWS +SORT BY age; + +SALARY +----------- + 90000.00 + 90000.00 + 28000.00 + 27000.12 + 136000.00 + 37000.40 +... + +--- 50 row(s) selected. +``` + +* Return the SALARY of 50 sales people. In this case, the table is +clustered on EMPID. If the optimizer chooses a plan to access rows using +the primary access path, the result consists of salaries of the 50 sales +people with the smallest employee identifiers. ++ +``` +SELECT salary +FROM salesperson +SAMPLE FIRST 50 ROWS; + +SALARY +----------- + 175500.00 + 137000.10 + 136000.00 + 138000.40 + 75000.00 + 90000.00 +... + +--- 50 row(s) selected. +``` + +<<< +* Return the SALARY of the youngest five sales people, skip the next 15 +rows, and repeat this process until no more rows exist in the +intermediate result table. You cannot specify periodic sampling with the +sample size larger than the period. ++ +``` +SELECT salary +FROM salesperson +SAMPLE PERIODIC 5 ROWS +EVERY 20 ROWS +SORT BY age; + +SALARY +----------- + 90000.00 + 90000.00 + 28000.00 + 27000.12 + 136000.00 + 36000.00 +... + +--- 17 row(s) selected. +``` ++ +In this example, 62 rows exist in the SALESPERSON table. For each set of +20 rows, the first five rows are selected. The last set consists of two +rows, both of which are selected. + +* Compute the average salary of a random 10 percent of the sales people. +You will get a different result each time you run this query because it +is based on a random sample. ++ +``` +SELECT AVG(salary) +FROM salesperson +SAMPLE RANDOM 10 PERCENT; + +(EXPR) +-------------------- + 61928.57 + +--- 1 row(s) selected. +``` + +<<< +* This query illustrates sampling after execution of the WHERE clause +has chosen the qualifying rows. The query computes the average salary of +a random 10 percent of the sales people over 35 years of age. You will +get a different result each time you run this query because it +is based on a random sample. ++ +``` +SELECT AVG(salary) +FROM salesperson +WHERE age > 35 +SAMPLE RANDOM 10 PERCENT; + +(EXPR) +-------------------- + 58000.00 + +--- 1 row(s) selected. +``` + +* Compute the average salary of a random 10 percent of sales people +belonging to the CORPORATE department. The sample is taken from the join +of the SALESPERSON and DEPARTMENT tables. You will get a different +result each time you run this query because it is based on a random +sample. ++ +``` +SELECT AVG(salary) +FROM salesperson S, department D +WHERE S.DNUM = D.DNUM AND D.NAME = 'CORPORATE' +SAMPLE RANDOM 10 PERCENT; + +(EXPR) +--------------------- + 106250.000 + +--- 1 row(s) selected. +``` + +<<< +* In this example, the SALESPERSON table is first sampled and then +joined with the DEPARTMENT table. This query computes the average salary +of all the sales people belonging to the CORPORATE department in a +random sample of 10 percent of the sales employees. ++ +``` +SELECT AVG(salary) +FROM + ( SELECT salary, dnum FROM salesperson SAMPLE RANDOM 10 PERCENT ) AS S + , department D +WHERE S.DNUM = D.DNUM + AND D.NAME = 'CORPORATE'; + +(EXPR) +-------------------- + +37000.000 + +--- 1 row(s) selected. +``` ++ +The results of this query and some of the results of previous queries +might return null: ++ +``` +SELECT AVG(salary) +FROM + ( SELECT salary, dnum FROM salesperson SAMPLE RANDOM 10 PERCENT ) AS S + , department D +WHERE S.DNUM = D.DNUM AND D.NAME = 'CORPORATE'; + +(EXPR) +-------------------- + +? + +--- 1 row(s) selected. +``` ++ +For this query execution, the number of rows returned by the embedded +query is limited by the total number of rows in the SALESPERSON table. +Therefore, it is possible that no rows satisfy the search condition in +the WHERE clause. + + +<<< +* In this example, both the tables are sampled first and then joined. +This query computes the average salary and the average sale amount +generated from a random 10 percent of all the sales people and 20 +percent of all the sales transactions. ++ +``` +SELECT AVG(salary), AVG(amount) +FROM ( SELECT salary, empid + FROM salesperson + SAMPLE RANDOM 10 PERCENT ) AS S, + ( SELECT amount, empid FROM sales + SAMPLE RANDOM 20 PERCENT ) AS T +WHERE S.empid = T.empid; + +(EXPR) (EXPR) +--------- --------- + 45000.00 31000.00 + +--- 1 row(s) selected. +``` + +* This example illustrates oversampling. This query retrieves 150 +percent of the sales transactions where the amount exceeds $1000. The +result contains every row at least once, and 50 percent of the rows, +picked randomly, occur twice. ++ +``` +SELECT * +FROM sales +WHERE amount > 1000 +SAMPLE RANDOM 150 PERCENT; + +EMPID PRODUCT REGION AMOUNT +----- -------------------- ------ ----------- + 1 PCGOLD, 30MB E 30000.00 + 23 PCDIAMOND, 60MB W 40000.00 + 23 PCDIAMOND, 60MB W 40000.00 + 29 GRAPHICPRINTER, M1 N 11000.00 + 32 GRAPHICPRINTER, M2 S 15000.00 + 32 GRAPHICPRINTER, M2 S 15000.00 + ... ... ... ... + +--- 88 row(s) selected. +``` + +<<< +* The BALANCE option enables stratified sampling. Retrieve the age and +salary of 1000 sales people such that 50 percent of the result are male +and 50 percent female. ++ +``` +SELECT age, sex, salary +FROM salesperson +SAMPLE FIRST +BALANCE + WHEN sex = 'male' THEN 15 ROWS + WHEN sex = 'female' THEN 15 ROWS + END +ORDER BY age; + +AGE SEX SALARY +----------- ------ ----------- + 22 male 28000.00 + 22 male 90000.00 + 22 female 136000.00 + 22 male 37000.40 + ... ... ... + +--- 30 row(s) selected. +``` + +* Retrieve all sales records with the amount exceeding $10000 and a +random sample of 10 percent of the remaining records: ++ +``` +SELECT * +FROM sales SAMPLE RANDOM +BALANCE + WHEN amount > 10000 + THEN 100 PERCENT + ELSE 10 PERCENT +END; + +PRODUCT REGION AMOUNT +-------------------- ------ ----------- +PCGOLD, 30MB E 30000.00 +PCDIAMOND, 60MB W 40000.00 +GRAPHICPRINTER, M1 N 11000.00 +GRAPHICPRINTER, M2 S 15000.00 +... ... ... +MONITORCOLOR, M2 N 10500.00 +... ... ... + +--- 32 row(s) selected. +``` + +<<< +* This query shows an example of stratified sampling where the +conditions are not mutually exclusive: ++ +``` +SELECT * +FROM sales SAMPLE RANDOM +BALANCE + WHEN amount > 10000 THEN 100 PERCENT + WHEN product = 'PCGOLD, 30MB' THEN 25 PERCENT + WHEN region = 'W' THEN 40 PERCENT + ELSE 10 PERCENT END; + +PRODUCT REGION AMOUNT +-------------------- ------ ----------- +PCGOLD, 30MB E 30000.00 +PCDIAMOND, 60MB W 40000.00 +GRAPHICPRINTER, M1 N 11000.00 +GRAPHICPRINTER, M2 S 15000.00 +GRAPHICPRINTER, M3 S 20000.00 +LASERPRINTER, X1 W 42000.00 +... ... ... + +--- 30 row(s) selected. +``` + +<<< +[[sequence_by_clause]] +== SEQUENCE BY Clause + +The SEQUENCE BY clause of the SELECT statement specifies the order in +which to sort the rows + +of the intermediate result table for calculating sequence functions. +This option is used for processing time-sequenced rows in data mining +applications. See <<select_statement>>. + +Sequence by is a {project-name} SQL extension. + +``` +SEQUENCE BY colname[ASC[ENDING]|DESC[ENDING]] + [,colname [ASC[ENDING] | DESC[ENDING]]]... +``` + +* `_colname_` +_ +names a column in _select-list_ or a column in a table reference in the +FROM clause of the SELECT statement. _colname_ is optionally qualified +by a table, view, or correlation name; for example, CUSTOMER.CITY. + +* `ASC | DESC` ++ +specifies the sort order. ASC is the default. For ordering an +intermediate result table on a column that can contain null, nulls are +considered equal to one another but greater than all other non-null +values. ++ +You must include a SEQUENCE BY clause if you include a sequence function +in the select list of the SELECT statement. Otherwise, {project-name} SQL +returns an error. Further, you cannot include a SEQUENCE BY clause if no +sequence function exists in the select list. See +<<sequence_functions,Sequence Functions>> . + +[[considerations_for_sequence_by]] +=== Considerations for SEQUENCE BY + +* Sequence functions behave differently from set (or aggregate) +functions and mathematical (or scalar) functions. +* If you include both SEQUENCE BY and GROUP BY clauses in the same +SELECT statement, the values of the sequence functions must be evaluated +first and then become input for aggregate functions in the statement. +** For a SELECT statement that contains both SEQUENCE BY and GROUP BY +clauses, you can nest the sequence function in the aggregate function: ++ +``` +SELECT + ordernum +, MAX(MOVINGSUM(qty_ordered, 3)) AS maxmovsum_qty +, AVG(unit_price) AS avg_price +FROM odetail +SEQUENCE BY partnum +GROUP BY ordernum; +``` + +* To use a sequence function as a grouping column, you must use a +derived table for the SEQUENCE BY query and use the derived column in +the GROUP BY clause: ++ +``` +SELECT + ordernum +, movsum_qty +, AVG(unit_price) +FROM + ( SELECT ordernum, MOVINGSUM(qty_ordered, 3), unit_price + FROM odetail SEQUENCE BY partnum ) + AS tab2 (ordernum, movsum_qty, unit_price) +GROUP BY ordernum, movsum_qty; +``` + +* To use an aggregate function as the argument to a sequence function, +you must also use a derived table: ++ +``` +SELECT MOVINGSUM(avg_price,2) +FROM + ( SELECT ordernum, AVG(unit_price) FROM odetail + GROUP BY ordernum) +AS tab2 (ordernum, avg_price) +SEQUENCE BY ordernum; +``` + +* Like aggregate functions, sequence functions generate an intermediate +result. If the query has a WHERE clause, its search condition is applied +during the generation of the intermediate result. Therefore, you cannot +use sequence functions in the WHERE clause of a SELECT statement. + +** This query returns an error: ++ +``` +SELECT ordernum, partnum, RUNNINGAVG(unit_price) +FROM odetail +WHERE ordernum > 800000 AND RUNNINGAVG(unit_price) > 350 +SEQUENCE BY qty_ordered; +``` + +** Apply a search condition to the result of a sequence function, use a +derived table for the SEQUENCE BY query, and use the derived column in +the WHERE clause: ++ +``` +SELECT ordernum, partnum, runavg_price +FROM + ( SELECT ordernum, partnum, RUNNINGAVG(unit_price) + FROM odetail SEQUENCE BY qty_ordered) +AS tab2 (ordernum, partnum, runavg_price) +WHERE ordernum > 800000 AND +runavg_price > 350; +``` + +[[examples_of_sequence_by]] +=== Examples of SEQUENCE BY + +* Sequentially number each row for the entire result and also number the +rows for each part number: ++ +``` +SELECT + RUNNINGCOUNT(*) AS RCOUNT +, MOVINGCOUNT(*,ROWS SINCE (d.partnum<>THIS(d.partnum))) AS MCOUNT +, d.partnum +FROM orders o, odetail d +WHERE o.ordernum=d.ordernum +SEQUENCE BY d.partnum, o.order_date, o.ordernum +ORDER BY d.partnum, o.order_date, o.ordernum; + +RCOUNT MCOUNT Part/Num +-------------------- --------------------- -------- + 1 1 212 + 2 2 212 + 3 1 244 + 4 2 244 + 5 3 244 + ... ... ... + 67 1 7301 + 68 2 7301 + 69 3 7301 + 70 4 7301 + +--- 70 row(s) selected. +``` + +<<< +* Show the orders for each date, the amount for each order item and the +moving total for each order, and the running total of all the orders. +The query sequences orders by date, order number, and part number. (The +CAST function is used for readability only.) ++ +``` +SELECT + o.ordernum +, CAST (MOVINGCOUNT(*,ROWS SINCE(THIS(o.ordernum) <> o.ordernum)) AS INT) AS MCOUNT +, d.partnum +, o.order_date +, (d.unit_price * d.qty_ordered) AS AMOUNT +, MOVINGSUM (d.unit_price * d.qty_ordered, SEQUENCE BY Clause 269 ROWS SINCE(THIS(o.ordernum)<>o.ordernum) ) AS ORDER_TOTAL +, RUNNINGSUM (d.unit_price * d.qty_ordered) AS TOTAL_SALES +FROM orders o, odetail d +WHERE o.ordernum=d.ordernum +SEQUENCE BY o.order_date, o.ordernum, d.partnum +ORDER BY o.order_date, o.ordernum, d.partnum; + +Order/Num MCOUNT Part/Num Order/Date AMOUNT ORDER_TOTAL TOTAL_SALES +---------- ----------- -------- ---------- ---------- -------------- -------------- + 100250 1 244 2008-01-23 14000.00 14000.00 14000.00 + 100250 2 5103 2008-01-23 4000.00 18000.00 18000.00 + 100250 3 6500 2008-01-23 950.00 18950.00 18950.00 + 200300 1 244 2008-02-06 28000.00 28000.00 46950.00 + 200300 2 2001 2008-02-06 10000.00 38000.00 56950.00 + 200300 3 2002 2008-02-06 14000.00 52000.00 70950.00 + ... ... ... ... ... ... ... + 800660 18 7102 2008-10-09 1650.00 187360.00 113295.00 + 800660 19 7301 2008-10-09 5100.00 192460.00 1118395.00 + +--- 69 row(s) selected. +``` ++ +For example, for order number 200300, the ORDER_TOTAL is a moving sum +within the order date 2008-02-06, and the TOTAL_SALES is a running sum +for all orders. The current window for the moving sum is defined as ROWS +SINCE (THIS(o.ordernum)<>o.ordernum), which restricts the ORDER_TOTAL to +the current order number. + +<<< +* Show the amount of time between orders by calculating the interval between two dates: ++ +``` +SELECT RUNNINGCOUNT(*),o.order_date,DIFF1(o.order_date) +FROM orders o +SEQUENCE BY o.order_date, o.ordernum +ORDER BY o.order_date, o.ordernum ; + + +(EXPR) Order/Date (EXPR) +-------------------- ---------- ------------- + 1 2008-01-23 ? + 2 2008-02-06 14 + 3 2008-02-17 11 + 4 2008-03-03 14 + 5 2008-03-19 16 + 6 2008-03-19 0 + 7 2008-03-27 8 + 8 2008-04-10 14 + 9 2008-04-20 10 + 10 2008-05-12 22 + 11 2008-06-01 20 + 12 2008-07-21 50 + 13 2008-10-09 80 + +--- 13 row(s) selected. +``` + +<<< +[[transpose_clause]] +== TRANSPOSE Clause + +The TRANSPOSE clause of the SELECT statement generates for each row of +the SELECT source table a row for each item in the transpose item list. +The result table of the TRANSPOSE clause has all the columns of the +source table plus, for each transpose item list, a value column or +columns and an optional key column. + +TRANSPOSE is a {project-name} SQL extension. + +``` +TRANSPOSE transpose-set [transpose-set]... + [KEY BY key-colname] + +transpose-set is: + transpose-item-list AS transpose-col-list + +transpose-item-list is: + expression-list +| (expression-list) [,(expression-list)]... + +expression-list is: + expression [,expression]... + +transpose-col-list is: + colname | (colname-list) + +colname-list is: + colname [,colname]... +``` + +* `_transpose-item-list_ AS _transpose-col-list_` ++ +specifies a _transpose-set_, which correlates a _transpose-item-list_ +with a _transpose-col-list_. The _transpose-item-list_ can be a list +of expressions or a list of expression lists enclosed in parentheses. +The _transpose-col-list_ can be a single column name or a list of column +names enclosed in parentheses. ++ +For example, in the _transpose-set_ TRANSPOSE (A,X),(B,Y),(C,Z) AS +(V1,V2), the items in the _transpose-item-list_ are (A,X),(B,Y), and +(C,Z), and the _transpose-col-list_ is (V1,V2). The number of +expressions in each item must be the same as the number of value columns +in the column list. ++ +In the example TRANSPOSE A,B,C AS V, the items are A,B, and C, and the +value column is V. This form can be thought of as a shorter way of writing TRANSPOSE +(A),(B),(C) AS (V). + +* `_transpose-item-list_` ++ +specifies a list of items. An item is a value expression or a list of +value expressions enclosed in parentheses. + +** `_expression-list_` ++ +specifies a list of SQL value expressions, separated by commas. The +expressions must have compatible data types. ++ +For example, in the transpose set TRANSPOSE A,B,C AS V, the expressions +A,B, and C have compatible data types. + +** `(_expression-list_) [,(_expression-list_)]&8230;` ++ +specifies a list of expressions enclosed in parentheses, followed by +another list of expressions enclosed in parentheses, and so on. The +number of expressions within parentheses must be equal for each list. +The expressions in the same ordinal position within the parentheses must +have compatible data types. ++ +For example, in the transpose set TRANSPOSE (A,X),(B,Y),(C,Z) AS +(V1,V2), the expressions A,B, and C have compatible data types, and the +expressions X,Y, and Z have compatible data types. + +* `_transpose-col-list_` ++ +specifies the columns that consist of the evaluation of expressions in +the item list as the expressions are applied to rows of the source +table. + +** `_colname_` ++ +is an SQL identifier that specifies a column name. It identifies the +column consisting of the values in _expression-list_. ++ +For example, in the transpose set TRANSPOSE A,B,C AS V, the column V +corresponds to the values of the expressions A,B, and C. + +** `(_colname-list_)` ++ +specifies a list of column names enclosed in parentheses. Each column +consists of the values of the expressions in the same ordinal position +within the parentheses in the transpose item list. ++ +For example, in the transpose set TRANSPOSE (A,X),(B,Y),(C,Z) AS +(V1,V2), the column V1 corresponds to the expressions A,B, and C, and +the column V2 corresponds to the expressions X,Y, and Z. + +* `KEY BY _key-colname_` ++ +optionally specifies which expression (the value in the transpose column +list corresponds to) by its position in the item list. _key-colname_ is +an SQL identifier. The data type of the key column is exact numeric, and +the value is NOT NULL. + +[[considerations_for_transpose]] +=== Considerations for TRANSPOSE + +[[multiple_transpose_clauses_and_sets]] +==== Multiple TRANSPOSE Clauses and Sets + +* Multiple TRANSPOSE clauses can be used in the same query. For example: ++ +``` +SELECT keycol1, valcol1, keycol2, valcol2 +FROM mytable +TRANSPOSE a, b, c AS valcol1 KEY BY keycol1 +TRANSPOSE d, e, f AS valcol2 KEY BY keycol2 +``` + +* A TRANSPOSE clause can contain multiple transpose sets. For example: ++ +``` +SELECT keycol, valcol1, valcol2 +FROM mytable +TRANSPOSE a, b, c AS valcol1 + d, e, f AS valcol2 +KEY BY keycol +``` + +[[degree_and_column_order_of_the_transpose_result]] +==== Degree and Column Order of the TRANSPOSE Result + +The degree of the TRANSPOSE result is the degree of the source table +(the result table derived from the table reference or references in the +FROM clause and a WHERE clause if specified), plus one if the key column +is specified, plus the cardinalities of all the transpose column lists. + +The columns of the TRANSPOSE result are ordered beginning with the +columns of the source table, followed by the key column if specified, +and then followed by the list of column names in the order in which they +are specified. + +[[data_type_of_the_transpose_result]] +==== Data Type of the TRANSPOSE Result + +The data type of each of the value columns is the union compatible data +type of the corresponding expressions in the _transpose-item-list_. +You cannot have expressions with data types that are not compatible in a +_transpose-item-list_. + +For example, in TRANSPOSE (A,X),(B,Y),(C,Z) AS (V1,V2), the data type of +V1 is the union compatible type for A, B, and C, and the data type of V2 +is the union compatible type for X, Y, and Z. + +See <<comparable_and_compatible_data_types,Comparable and Compatible Data Types>>. + +[[cardinality_of_the_transpose_result]] +==== Cardinality of the TRANSPOSE Result + +The items in each _transpose-item-list_ are enumerated from 1 to N, +where N is the total number of items in all the item lists in the +transpose sets. + +In this example with a single transpose set, the value of N is 3: + +``` +TRANSPOSE (a,x),(b,y),(c,z) AS (v1,v2) +``` + +In this example with two transpose sets, the value of N is 5: + +``` +TRANSPOSE (a,x),(b,y),(c,z) AS (v1,v2) l,m AS v3 +``` + +The values 1 to N are the key values _k_i. The items in each +_transpose-item-list_ are the expression values _v_i. + +The cardinality of the result of the TRANSPOSE clause is the cardinality +of the source table times N, the total number of items in all the +transpose item lists. + +For each row of the source table and for each value in the key values +_k_i, the TRANSPOSE result contains a row with all the attributes of +the source table, the key value _k_i in the key column, the expression +values vi in the value columns of the corresponding transpose set, and +NULL in the value columns of other transpose sets. + +For example, consider this TRANSPOSE clause: + +``` +TRANSPOSE (a,x),(b,y),(c,z) AS (v1,v2) + l,m AS v3 +KEY BY k +``` + +The value of N is 5. One row of the SELECT source table produces this +TRANSPOSE result: + +[cols="5*",options="header"] +|=== +| _columns-of-source_ | K | V1 | V2 | V3 +| _source-row_ | 1 | _value-of-A_ | _value-of-X_ | NULL +| _source-row_ | 2 | _value-of-B_ | _value-of-Y_ | NULL +| _source-row_ | 3 | _value-of-C_ | _value-of-Z_ | NULL +| _source-row_ | 4 | NULL | NULL | _value-of-L_ +| _source-row_ | 5 | NULL | NULL | _value-of-M_ +|=== + +<<< +[[examples_of_transpose]] +=== Examples of TRANSPOSE + +* Suppose that MYTABLE has been created as: ++ +``` +CREATE TABLE mining.mytable +( A INTEGER, B INTEGER, C INTEGER, D CHAR(2), E CHAR(2), F CHAR(2) ); +``` ++ +The table MYTABLE has columns A, B, C, D, E, and F with related data. +The columns A, B, and C are type INTEGER, and columns D, E, and F are +type CHAR. ++ +[cols="6*",options="header"] +|==== +| A | B | C | D | E | F +| 1 | 10 | 100 | d1 | e1 | f1 +| 2 | 20 | 200 | d2 | e2 | f2 +|==== + +* Suppose that MYTABLE has only the first three columns: A, B, and C. +The result of the TRANSPOSE clause has three times as many rows (because +three items exist in the transpose item list) as rows exist in MYTABLE: ++ +``` +SELECT * FROM mytable +TRANSPOSE a, b, c AS valcol KEY BY keycol; +``` ++ +The result table of the TRANSPOSE query is: ++ +[cols="8*",options="header"] +|=== +| A | B | C | D | E | F | KEYCOL | VALCOL +| 1 | 10 | 100 | d1 | e1 | f1 | 1 | 1 +| 1 | 10 | 100 | d1 | e1 | f1 | 2 | 10 +| 1 | 10 | 100 | d1 | e1 | f1 | 3 | 100 +| 2 | 20 | 200 | d2 | e2 | f2 | 1 | 2 +| 2 | 20 | 200 | d2 | e2 | f2 | 2 | 20 +| 2 | 20 | 200 | d2 | e2 | f2 | 3 | 200 +|=== + +<<< +* This query shows that the items in the transpose item list can be any +valid scalar expressions: ++ +``` +SELECT keycol, valcol, a, b, c FROM mytable +TRANSPOSE a + b, c + 3, 6 AS valcol KEY BY keycol; +``` ++ +The result table of the TRANSPOSE query is: ++ +[cols="5*",options="header"] +|===== +| KEYCOL | VALCOL | A | B | C +| 1 | 1 | 1 | 10 | 100 +| 2 | 103 | 1 | 10 | 100 +| 3 | 6 | 1 | 10 | 100 +| 1 | 22 | 2 | 20 | 200 +| 2 | 203 | 2 | 20 | 200 +| 3 | 6 | 2 | 20 | 200 +|===== + +* This query shows how the TRANSPOSE clause can be used with a GROUP BY +clause. This query is typical of queries used to obtain cross-table +information, where A, B, and C are the independent variables, and D is +the dependent variable. ++ +``` +SELECT keycol, valcol, d, COUNT(*) +FROM mytable +TRANSPOSE a, b, c AS valcol +KEY BY keycol +GROUP BY keycol, valcol, d; +``` ++ +The result table of the TRANSPOSE query is: ++ +[cols="4*",options="header"] +|=== +| KEYCOL | VALCOL | D | COUNT(*) +| 1 | 1 | d1 | 1 +| 2 | 10 | d1 | 1 +| 3 | 100 | d1 | 1 +| 1 | 2 | d2 | 1 +| 2 | 20 | d2 | 1 +| 3 | 200 | d2 | 1 +|=== + +<<< +* This query shows how to use COUNT applied to VALCOL. The result table +of the TRANSPOSE query shows the number of distinct values in VALCOL. ++ +``` +SELECT COUNT(DISTINCT valcol) FROM mytable +TRANSPOSE a, b, c AS valcol KEY BY keycol +GROUP BY keycol; + +(EXPR) +-------------------- + 2 + 2 + 2 + +--- 3 row(s) selected. +``` + +* This query shows how multiple TRANSPOSE clauses can be used in the +same query. The result table from this query has nine times as many rows +as rows exist in MYTABLE: ++ +``` +SELECT keycol1, valcol1, keycol2, valcol2 FROM mytable +TRANSPOSE a, b, c AS valcol1 KEY BY keycol1 +TRANSPOSE d, e, f AS valcol2 KEY BY keycol2; +``` ++ +The result table of the TRANSPOSE query is: ++ +[cols=",,,",options="header"] +|=== +| KEYCOL1 | VALCOL1 | KEYCOL2 | VALCOL2 +| 1 | 1 | 1 | d1 +| 1 | 1 | 2 | e1 +| 1 | 1 | 3 | f1 +| 2 | 10 | 1 | d1 +| 2 | 10 | 2 | e1 +| 2 | 10 | 3 | f1 +| 3 | 100 | 1 | d1 +| 3 | 100 | 2 | e1 +| 3 | 100 | 3 | f1 +| 1 | 2 | 1 | d2 +| 1 | 2 | 2 | e2 +| 1 | 2 | 3 | f2 +| 2 | 20 | 1 | d2 +| 2 | 20 | 2 | e2 +| 2 | 20 | 3 | f2 +| 3 | 200 | 1 | d2 +| 3 | 200 | 2 | e2 +| 3 | 200 | 3 | f2 +|=== + +* This query shows how a TRANSPOSE clause can contain multiple transpose +setsâthat is, multiple _transpose-item-list_ AS _transpose-col-list_. +The expressions A, B, and C are of type integer, and expressions D, E, +and F are of type character. ++ +``` +SELECT keycol, valcol1, valcol2 +FROM mytable +TRANSPOSE a, b, c AS valcol1 + d, e, f AS valcol2 +KEY BY keycol; +``` ++ +The result table of the TRANSPOSE query is: ++ +[cols="3*",options="header"] +|=== +| KEYCOL | VALCOL1 | VALCOL2 +| 1 | 1 | ? +| 2 | 10 | ? +| 3 | 100 | ? +| 4 | ? | d1 +| 5 | ? | e1 +| 6 | ? | f1 +| 1 | 2 | ? +| 2 | 20 | ? +| 3 | 200 | ? +| 4 | ? | d2 +| 5 | ? | e2 +| 6 | ? | f2 +|=== ++ +A question mark (?) in a value column indicates no value for the given KEYCOL. + +* This query shows how the preceding query can include a GROUP BY clause: ++ +``` +SELECT keycol, valcol1, valcol2, COUNT(*) +FROM mytable +TRANSPOSE a, b, c AS valcol1 + d, e, f AS valcol2 +KEY BY keycol +GROUP BY keycol, valcol1, valcol2; +``` ++ +The result table of the TRANSPOSE query is: ++ +[cols="4*",options="header"] +|=== +| KEYCOL | VALCOL1 | VALCOL2 | (EXPR) +| 1 | 1 | ? | 1 +| 2 | 10 | ? | 1 +| 3 | 100 | ? | 1 +| 1 | 2 | ? | 1 +| 2 | 20 | ? | 1 +| 3 | 200 | ? | 1 +| 4 | ? | d2 | 1 +| 5 | ? | e2 | 1 +| 6 | ? | f2 | 1 +| 4 | ? | d1 | 1 +| 5 | ? | e1 | 1 +| 6 | ? | f1 | 1 +|=== + +* This query shows how an item in the transpose item list can con
<TRUNCATED>
