Sounds like a good idea to me. A quick scan of the current TPC-H spec at http://www.tpc.org/tpc_documents_current_versions/pdf/tpc-h_v2.18.0.pdf suggests to me that such a change is conformant to the spec. Specifically:
In 1.3 Data Types: "Date is a value whose external representation can be expressed as YYYY-MM-DD, where all characters are numeric. A date must be able to express any day within at least 14 consecutive years. There is no requirement specific to the internal representation of a date." In 2.1.3 Substitution Parameters and Output Data: "Comment 1: When dates are part of the substitution parameters, they must be expressed in a format that includes the year, month and day in integer form, in that order (e.g., YYYY-MM-DD). The delimiter between the year, month and day is not specified. Other date representations, for example the number of days since 1970-01-01, are specifically not allowed." In 2.2.3 Minor Query Modifications: "Date expressions - For queries that include an expression involving manipulation of dates (e.g., adding/subtracting days/months/years, or extracting years from dates), vendor-specific syntax may be used TPC BenchmarkTM H Standard Specification Revision 2.18.0 Page 26 instead of the specified SQL-92 syntax. Replacement syntax must have equivalent semantic behavior. Examples of acceptable implementations include "YEAR(<column>)" to extract the year from a date column or "DATE(<date>) + 3 MONTHS" to add 3 months to a date." In 4.2 DBGEN and Database Population 4.2.2 Definition Of Terms: "4.2.2.8 The term date represents a string of numeric characters separated by hyphens and comprised of a 4 digit year, 2 digit month and 2 digit day of the month." I believe that our DATE implementation conforms to all these requirements, so we should be OK. Thanks, - Laszlo On Tue, Jan 28, 2020 at 1:59 PM Gabor Kaszab <gaborkas...@apache.org> wrote: > Hey, > > Recently I have been running some perf tests using the TPCH database and I > observed that the date columns are still stored as string even though we > have the date type implemented recently. I'm wondering if there are any > objections against changing the type of these columns to date. My > assumption is that none of the queries have to be changed just the SQL that > creates the table. > > Cheers, > Gabor >