On Mon, Jun 14, 2021 at 11:50 AM Antoine Pitrou <anto...@python.org> wrote:
> > Le 14/06/2021 à 18:47, Wes McKinney a écrit : > > On Mon, Jun 14, 2021 at 11:33 AM Antoine Pitrou <anto...@python.org> > wrote: > > > >> > >> Le 14/06/2021 à 18:28, Wes McKinney a écrit : > >>> Hi Antoine — when there is no time zone specified, I do not think it is > >>> appropriate to consider the data to refer to a specific moment in time > >>> without applying an explicit time zone localization. > >> > >> Well, how can that be done? The timezone information is lost, how can > >> the user (who possibly got the data from another source) recover it? > > > > > >> This is usually something that people take care of in their application > > code. For example, when you parse a CSV and obtain “raw” timestamps, you > > have to call “tz_localize” to apply a time zone to the and normalize the > > internal representation to UTC. > > Right, this is why I advocate for this to be done at the boundary layer. > I.e, the CSV, Parquet... readers would expose an option to set the > timezone of timestamp columns to a well-defined value. > In practice I think this would be impractical. This is something that users expect to be able to address in their data preparation as they currently do with other tools and systems. To force this issue (versus having auto-localization as an optional feature that you opt in to) at data ingest time would be a nuisance and harm many kinds of users. > > If you don’t know what the time zone is supposed to be then you can’t get > > it back, but you can still do many analytical operations on the data > > (aggregating by year or month, for example) just fine. For many users the > > absence of time zones is a non-issue in their work. > > So, basically, a timestamp without a timezone is still useful as a date > (mostly, because the day number may be off)? If you parse a timestamp string, then you can extract all of the fields (including hour and day) from the resulting int64 values and they will be the same as they appeared in the strings. Many users never need to worry about time zone isn’t their analyses. I’ve exhausted my ability to discuss this topic on mobile internet so I will pick up the discussion later in the week when I can provide supplementary code examples. > > But then, why don't we tell users to simply use a date type for such data? >