AnkitAhlawat7742 opened a new pull request, #49878:
URL: https://github.com/apache/arrow/pull/49878

   
   
   
   ### Rationale for this change
   
   When converting a pandas.Categorical with tz-aware datetime categories to a 
PyArrow array, the timezone information was silently dropped from the 
dictionary array's value type. This is a silent data loss bug — no warning or 
error is raised, but the timezone metadata is lost.
   
   ### What changes are included in this PR?
   
   In `python/pyarrow/array.pxi`, the Categorical conversion was using 
`values.categories.values(raw numpy array) `which strips timezone metadata 
since numpy does not support tz-aware datetimes. Changed to values.categories 
(pandas Index) and added from_pandas=True so PyArrow uses the pandas conversion 
path, which correctly preserves timezone metadata.
   
   ### Are these changes tested?
   
   Yes. Verified manually 
   ### Are there any user-facing changes?
   
   Yes — this is a bug fix. Users did #49875 
   
   This PR contains a **"Critical Fix"** — timezone information was lost 
silently during conversion without any warning or error.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to