I think this is the right direction to go, but I'm wondering how can Spark support these new types if the underlying data sources(like parquet files) do not support them yet.
I took a quick look at the new doc for file formats, but not sure what's the proposal. Are we going to implement these new types in Parquet/Orc first? Or are we going to use low-level physical types directly and add Spark-specific metadata to Parquet/Orc files? On Wed, Feb 20, 2019 at 10:57 PM Zoltan Ivanfi <z...@cloudera.com.invalid> wrote: > Hi, > > Last december we shared a timestamp harmonization proposal > <https://goo.gl/VV88c5> with the Hive, Spark and Impala communities. This > was followed by an extensive discussion in January that lead to various > updates and improvements to the proposal, as well as the creation of a new > document for file format components. February has been quiet regarding this > topic and the latest revision of the proposal has been steady in the recent > weeks. > > In short, the following is being proposed (please see the document for > details): > > - The TIMESTAMP WITHOUT TIME ZONE type should have LocalDateTime > semantics. > - The TIMESTAMP WITH LOCAL TIME ZONE type should have Instant > semantics. > - The TIMESTAMP WITH TIME ZONE type should have OffsetDateTime > semantics. > > This proposal is in accordance with the SQL standard and many major DB > engines. > > Based on the feedback we got I believe that the latest revision of the > proposal addresses the needs of all affected components, therefore I would > like to move forward and create JIRA-s and/or roadmap documentation pages > for the desired semantics of the different SQL types according to the > proposal. > > Please let me know if you have any remaning concerns about the proposal or > about the course of action outlined above. > > Thanks, > > Zoltan >