Thanks Wes. I think I've managed to confuse myself pretty good over this, I'm not sure where the fix should be. Spark, by default, will store a timestamp internally with python "time.mktime", which is in local time and not UTC, I believe. If there is a tzinfo object, Spark will use "calendar.timegm" instead, and I get the correct values. Maybe this is a Spark issue?
On Tue, Apr 25, 2017 at 11:52 AM, Wes McKinney <wesmck...@gmail.com> wrote: > hi Bryan, > > You will want to create DataFrame objects having datetime64[ns] columns. > There are some examples in the pyarrow test suite: > > https://github.com/apache/arrow/blob/master/python/ > pyarrow/tests/test_convert_pandas.py#L324 > > You can convert an array of datetime.datetime objects to datetime64[ns] > dtype with pandas.to_datetime > > In [15]: df = pd.DataFrame(data) > > In [16]: df['timestamp_t'] = pd.to_datetime(df.timestamp_t) > > In [17]: df.dtypes > Out[17]: > timestamp_t datetime64[ns] > dtype: object > > pd.to_datetime does not seem to work with the NaiveTZ object here (if Jeff > Reback is reading, maybe he can explain why); why do you need that for > tz-naive data? If that's something we absolutely need fixed in pandas, we > should try to do it right away since the 0.20 rc is pending right now. > > - Wes > > On Tue, Apr 25, 2017 at 1:38 PM, Bryan Cutler <cutl...@gmail.com> wrote: > > > I am writing a unit test to compare that a Pandas DataFrame made by Arrow > > is equal to one constructed directly with data. The timestamp values > are a > > Python datetime object with a timezone tzinfo object. When I compare the > > results, the values are equal but the schema is not. Using arrow the > type > > is "datetime64[ns]" and without it is "object." Without a tzinfo, the > > types match but I do need it there for the conversion with Arrow data. I > > could just replace the tzinfo for the Pandas DataFrame, it is a naive > > timezone with utcoffset=None. Does anyone know another way to produce > > compatible types? I do need the data to be compatible with Spark too. > > Hopefully this makes sense, I could attach some code if that would help, > > thanks! Here is a sample of the data: > > > > class NaiveTZ(tzinfo): > > def utcoffset(self, date_time): > > return None > > > > def dst(self, date_time): > > return None > > > > data = {"timestamp_t": [datetime(2011, 1, 1, 1, 1, 1, tzinfo=NaiveTZ())]} > > > > pd.DataFrame(data) > > >