Re: [I] pd.to_parquet() Writes Timestamps In a method unreadable by Redshift Spectrum [arrow]

via GitHub Tue, 03 Oct 2023 14:28:00 -0700


IkeNefcy commented on issue #38000:
URL: https://github.com/apache/arrow/issues/38000#issuecomment-1745751806


   I'll grab a sample in a moment yeah. For error message there is no error, 
from upload to query there is no push back from any systems, it allows the 
parquet to be read during the copy function with no issue. 
   Also adding to clarify why we know it's Spectrum Specifically: 
   We tested downloading and rereading the files in pandas and in online 
parquet readers. 
   We tested on a cross account, and on the same account redshift was on. 
   We tested Lake Formation Hive Style tables, these show as external tables in 
Redshift and Spectrum reads these. For the table we tested, we can currently 
query in Athena and there is no issue, then the same query in Redshift is 
corrupted (this is not copying files, Spectrum is just reading from the s3 
directly)
   
   And I'll test out that versioning once I upload a sample.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] pd.to_parquet() Writes Timestamps In a method unreadable by Redshift Spectrum [arrow]

Reply via email to