waitingkuo commented on code in PR #4107: URL: https://github.com/apache/arrow-datafusion/pull/4107#discussion_r1014616309
########## datafusion/sql/src/planner.rs: ########## @@ -2671,6 +2673,113 @@ impl<'a, S: ContextProvider> SqlToRel<'a, S> { Ok(lit(ScalarValue::new_list(Some(values), data_type))) } } + + fn convert_data_type(&self, sql_type: &SQLDataType) -> Result<DataType> { + match sql_type { + SQLDataType::Array(inner_sql_type) => { + let data_type = self.convert_simple_data_type(inner_sql_type)?; + + Ok(DataType::List(Box::new(Field::new( + "field", data_type, true, + )))) + } + other => self.convert_simple_data_type(other), + } + } + fn convert_simple_data_type(&self, sql_type: &SQLDataType) -> Result<DataType> { + match sql_type { + SQLDataType::Boolean => Ok(DataType::Boolean), + SQLDataType::TinyInt(_) => Ok(DataType::Int8), + SQLDataType::SmallInt(_) => Ok(DataType::Int16), + SQLDataType::Int(_) | SQLDataType::Integer(_) => Ok(DataType::Int32), + SQLDataType::BigInt(_) => Ok(DataType::Int64), + SQLDataType::UnsignedTinyInt(_) => Ok(DataType::UInt8), + SQLDataType::UnsignedSmallInt(_) => Ok(DataType::UInt16), + SQLDataType::UnsignedInt(_) | SQLDataType::UnsignedInteger(_) => { + Ok(DataType::UInt32) + } + SQLDataType::UnsignedBigInt(_) => Ok(DataType::UInt64), + SQLDataType::Float(_) => Ok(DataType::Float32), + SQLDataType::Real => Ok(DataType::Float32), + SQLDataType::Double | SQLDataType::DoublePrecision => Ok(DataType::Float64), + SQLDataType::Char(_) + | SQLDataType::Varchar(_) + | SQLDataType::Text + | SQLDataType::String => Ok(DataType::Utf8), + SQLDataType::Timestamp(tz_info) => { + let tz = if matches!(tz_info, TimezoneInfo::Tz) + || matches!(tz_info, TimezoneInfo::WithTimeZone) + { + match self.schema_provider.get_config_option("datafusion.execution.time_zone") { + Some(ScalarValue::Utf8(s)) => { + s + } + Some(_) => { + None + } + None => None + } + //Some("+00:00".to_string()) + + } else { + None + }; + Ok(DataType::Timestamp(TimeUnit::Nanosecond, tz)) Review Comment: Sql's TimestampTz determine the timezone during the execution (perhaps right before it's been execute). arrow doesn't support TimestampTz in nature. what arrow has here is `Timestamp<TimeUnit, Some(tz)>`. I've been thinking about how to deal with this for a long time :vomiting_face: . I have 2 approaches to do so 1. add some thing like `Timestamp<TimeUnit, Some("session")>` All the time related functions/kernels need to convert this "session" to the timezone by themselves. i.e. `fn week()` accept `Timestamp<TimeUnit, Some("session")>` as input and use the timezone in config_options to convert the underline timestamp and then do the week extraction. This will be more that the `TImestampTz` in SQL 2. arrow's `Timestamp<TImeUnit, Some("some-timezone")>` is determined while SQL is translated to arrow's data type. and this is what this pr is using. SQL's TimestampTz isn't deterministic while making arrow's data deterministic makes more sense to me; so i picked the 2nd approach for now. But i'm happy to try the first approach if anyone is interested in -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org