[GitHub] [arrow] micomahesh1982 commented on issue #13125: parquet conversion failed,Bool column has NA values in column boolean__v

GitBox Fri, 13 May 2022 09:54:23 -0700


micomahesh1982 commented on issue #13125:
URL: https://github.com/apache/arrow/issues/13125#issuecomment-1126257354


   couple of things i do explain here so that you may have visibility and 
provide me the solution which fits in this, 
   (1) We have a source file like (CSV file, size of 6GB 
compressed/uncompressed), then we don't read whole file into memory using 
pandas, do use 'chunks' then pass this to pyarrow to convert parquet and write 
on s3 until all 'chunk's are done.
   
   This approach have control of memory consumption and run into any high 
memory usage so that 'chunk' used. however while writing chunk into s3:// 
folder does it cause below error.
   Python Error: <>, exitCode: <139>
   
   have you come across any scenario to overcome or where it's happening? 
please let me
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] micomahesh1982 commented on issue #13125: parquet conversion failed,Bool column has NA values in column boolean__v

Reply via email to