Thanks Wes.  I think I've managed to confuse myself pretty good over this,
I'm not sure where the fix should be.  Spark, by default, will store a
timestamp internally with python "time.mktime", which is in local time and
not UTC, I believe.  If there is a tzinfo object, Spark will use
"calendar.timegm" instead, and I get the correct values.  Maybe this is a
Spark issue?

On Tue, Apr 25, 2017 at 11:52 AM, Wes McKinney <wesmck...@gmail.com> wrote:

> hi Bryan,
>
> You will want to create DataFrame objects having datetime64[ns] columns.
> There are some examples in the pyarrow test suite:
>
> https://github.com/apache/arrow/blob/master/python/
> pyarrow/tests/test_convert_pandas.py#L324
>
> You can convert an array of datetime.datetime objects to datetime64[ns]
> dtype with pandas.to_datetime
>
> In [15]: df = pd.DataFrame(data)
>
> In [16]: df['timestamp_t'] = pd.to_datetime(df.timestamp_t)
>
> In [17]: df.dtypes
> Out[17]:
> timestamp_t    datetime64[ns]
> dtype: object
>
> pd.to_datetime does not seem to work with the NaiveTZ object here (if Jeff
> Reback is reading, maybe he can explain why); why do you need that for
> tz-naive data? If that's something we absolutely need fixed in pandas, we
> should try to do it right away since the 0.20 rc is pending right now.
>
> - Wes
>
> On Tue, Apr 25, 2017 at 1:38 PM, Bryan Cutler <cutl...@gmail.com> wrote:
>
> > I am writing a unit test to compare that a Pandas DataFrame made by Arrow
> > is equal to one constructed directly with data.  The timestamp values
> are a
> > Python datetime object with a timezone tzinfo object.  When I compare the
> > results, the values are equal but the schema is not.  Using arrow the
> type
> > is "datetime64[ns]" and without it is "object."  Without a tzinfo, the
> > types match but I do need it there for the conversion with Arrow data.  I
> > could just replace the tzinfo for the Pandas DataFrame, it is a naive
> > timezone with utcoffset=None.  Does anyone know another way to produce
> > compatible types?  I do need the data to be compatible with Spark too.
> > Hopefully this makes sense, I could attach some code if that would help,
> > thanks! Here is a sample of the data:
> >
> > class NaiveTZ(tzinfo):
> >     def utcoffset(self, date_time):
> >         return None
> >
> >     def dst(self, date_time):
> >         return None
> >
> > data = {"timestamp_t": [datetime(2011, 1, 1, 1, 1, 1, tzinfo=NaiveTZ())]}
> >
> > pd.DataFrame(data)
> >
>

Reply via email to