> > I didn’t find any page/documentation on how to do RFC in Arrow protocol, > so can anyone point me to it or PR with email will be enough?
That is enough to start discussion. Before formal acceptance and merging of the PR there needs to be a Java and C++ implementations for the type that pass integration tests. At the time this guideline was instituted Java and C++ were considered the "reference" implementations (I think they still have the most complete integration test coverage). My understanding is that the current modelling of intervals mimics SQL standards (e.g. SQL Server [1]). So it would also be good to step back and understand what problem DF is trying to solve and how it differs from other SQL implementations. I'd be hesitant to accept COMPLEX as a new type without a much deeper analysis into calendar representations within Arrow and how they relate to other existing systems (e.g. Hive and some assortment of existing SQL databases). For instance the current modelling of timestamps does not lend itself to constructing a COMPLEX interval type particularly well. (Duration was introduced for this reason). I think both Wes's suggestion of FixedSizeBinary and Andrew's of composing the with a struct are good stop-gaps. These obviously have different trade-offs. Ultimately, it would be good to define common extension types that can represent this use-case if there really is demand for it (if it doesn't become a top level type). [1] https://docs.microsoft.com/en-us/sql/odbc/reference/appendixes/interval-data-types?view=sql-server-ver15 -Micah On Wed, Feb 17, 2021 at 2:05 PM Andrew Lamb <al...@influxdata.com> wrote: > That is a great suggestion Wes, thank you. > > I wonder if we could get away with a 128 bit representation that is the > concatenation of the two existing interval types (YearMonth)(DayTime). Or > maybe even define a `struct` type with those fields that is used by > DataFusion. > > Basically, given our reading of the Arrow spec[1], it is currently not > possible to precisely represent an interval that has both monthly and > sub-montly granularity. > > As Dmtry says, if you have an interval seemingly simple like 1 month, 1 > day > > Using IntervalUnit(YEAR_MONTH) can't represent the 1 day > Using IntervalUnit(DAY_TIME) can't represent the month as different months > have different numbers of days > > [1] > https://github.com/apache/arrow/blob/master/format/Schema.fbs#L249-L260 > > > On Wed, Feb 17, 2021 at 5:01 PM Wes McKinney <wesmck...@gmail.com> wrote: > > > On Wed, Feb 17, 2021 at 3:46 PM <t...@dmtry.me> wrote: > > > > > > > It's unclear to me that this needs to be introduced into the > top-level > > > > > > Similar thing to columnar format, How to store interval like 1 month 1 > > day 1 hour? It’s not possible to do it without converting 1 month to 30 > > days, which is a bad way. > > > > > > > Presumably you can represent a complex interval in a fixed number of > > bytes, and then embed the data in a FixedSizeBinary type. You can > > adorn this type with extension type metadata so that DataFusion can > > then apply Interval semantics to it. This could also serve as an > > interim strategy for you to proceed with implementation while > > proposing a top-level type to the Arrow format (which may or may not > > be accepting) so you aren't blocked on acceptance of changes into > > Schema.fbs. > > > > > > On 17 Feb 2021, at 21:02, Wes McKinney <wesmck...@gmail.com> wrote: > > > > > > > > It's unclear to me that this needs to be introduced into the > top-level > > > > columnar format without more analysis — have you considered > > > > implementing this for DataFusion as an extension type for the time > > > > being? > > > > > > > > On Wed, Feb 17, 2021 at 11:59 AM t...@dmtry.me <mailto:t...@dmtry.me > > > > <t...@dmtry.me <mailto:t...@dmtry.me>> wrote: > > > >> > > > >> Hi, > > > >> > > > >> For now, There are only two types of IntervalUnit inside Arrow: > > > >> > > > >> - YearMonth - month stored as int32 > > > >> - DayTime - days as int32 and time in milliseconds as in32. Total > > (64 bites) > > > >> > > > >> Since DF is using Arrow, It’s not possible to store “Complex” > > intervals such 1 MONTH 1 DAY 1 HOUR. > > > >> I think, the best way to understand the problem will be to read a > > comment from DF codebase: > > > https://github.com/apache/arrow/blob/bca7d2fe84ccd8fc1129cb4d85448eb0779c52c3/rust/datafusion/src/sql/planner.rs#L1148 > > > >> > > > >> // Interval is tricky thing > > > >> // 1 day is not 24 hours because timezones, 1 year != > 365/364! > > 30 days != 1 month > > > >> // The true way to store and calculate intervals is to store > > it as it defined > > > >> // Due the fact that Arrow supports only two types YearMonth > > (month) and DayTime (day, time) > > > >> // It's not possible to store complex intervals > > > >> // It's possible to do select (NOW() + INTERVAL '1 year') + > > INTERVAL '1 day'; as workaround > > > >> if result_month != 0 && (result_days != 0 || result_millis != > > 0) { > > > >> return Err(DataFusionError::NotImplemented(format!( > > > >> "DF does not support intervals that have both a > > Year/Month part as well as Days/Hours/Mins/Seconds: {:?}. Hint: try > > breaking the interval into two parts, one with Year/Month and the other > > with Days/Hours/Mins/Seconds - e.g. (NOW() + INTERVAL '1 year') + > INTERVAL > > '1 day'", > > > >> value > > > >> ))); > > > >> } > > > >> > > > >> > > > >> > > > >> I prepared a PR https://github.com/apache/arrow/pull/9516/files < > > https://github.com/apache/arrow/pull/9516/files> < > > https://github.com/apache/arrow/pull/9516/files < > > https://github.com/apache/arrow/pull/9516/files>> that introduce a new > > type for IntervalUnit called Complex, that store both YearMonth and > DayTime > > to support complex interval. > > > >> I didn’t find any page/documentation on how to do RFC in Arrow > > protocol, so can anyone point me to it or PR with email will be enough? > > > >> > > > >> Thanks. > > > > > >