micomahesh1982 commented on issue #13125: URL: https://github.com/apache/arrow/issues/13125#issuecomment-1126257354
couple of things i do explain here so that you may have visibility and provide me the solution which fits in this, (1) We have a source file like (CSV file, size of 6GB compressed/uncompressed), then we don't read whole file into memory using pandas, do use 'chunks' then pass this to pyarrow to convert parquet and write on s3 until all 'chunk's are done. This approach have control of memory consumption and run into any high memory usage so that 'chunk' used. however while writing chunk into s3:// folder does it cause below error. Python Error: <>, exitCode: <139> have you come across any scenario to overcome or where it's happening? please let me -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
