[GitHub] [arrow-datafusion-python] mesejo commented on issue #66: Register_parquet not working for pandas parquet files

via GitHub Mon, 14 Aug 2023 02:04:20 -0700


mesejo commented on issue #66:
URL: 
https://github.com/apache/arrow-datafusion-python/issues/66#issuecomment-1676956208


   Thanks for the report @marvin-lge. Actually, this issue is unrelated to 
compression; the problem is that the default value for`file_extension` for 
`register_parquet` is `".parquet"`. Change the code to: 
   
   ```python
   ctx.register_parquet(name="example_pq", path="df.pq", file_extension=".pq")  
# note file_extension = ".pq"
   
   # test parquet
   df = ctx.sql("SELECT * FROM example_pq")
   result = df.collect()
   res = result[0]
   print(res)
   ```
   **Output**
   ```
   pyarrow.RecordBatch
   col1: int64
   col2: int64
   col3: int64
   col4: int64
   ```
   
   I agree that the usage of `register_parquet` is not very intuitive. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion-python] mesejo commented on issue #66: Register_parquet not working for pandas parquet files

Reply via email to