[GitHub] [arrow-datafusion-python] marvin-lge opened a new issue, #66: Register parquet not working with parquet file from pandas

GitBox Thu, 03 Nov 2022 19:50:46 -0700


marvin-lge opened a new issue, #66:
URL: https://github.com/apache/arrow-datafusion-python/issues/66


   **Bug descripion**
   From pandas I am writing a parquet file (using gzip compression), I am 
looking to query this file using datafusion. The same file exported as .csv is 
working fine using this library, but the parquet version is not returning 
anything.
   
   **Steps to reproduce **
   ```python
   import datafusion
   import pandas as pd
   
   df = pd.DataFrame(data={'col1': [1, 2, 4, 5, 6], 'col2': [3, 4, 3, 5, 2], 
'col3': [3, 4, 1, 2, 3], 'col4': [3, 4, 4, 5, 6]})
   df.to_csv('df.csv', compression=None)  
   df.to_parquet('df.pq', compression=None)  
   
   ctx = datafusion.SessionContext()
   ctx.register_csv(name="example_csv", path="df.csv")
   ctx.register_parquet(name="example_pq", path="df.pq")
   
   # test csv
   df = ctx.sql("SELECT * FROM example_csv")
   result = df.collect()
   res = result[0]
   
   # test parquet
   df = ctx.sql("SELECT * FROM example_pq")
   result = df.collect()
   res = result[0]
   ```
   
   **Expected behavior**
   The same result from both approaches
   
   **Additional context**
   It also seems that gzip compressed files are not working, this is probably a 
limitation of the core datafusion library?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion-python] marvin-lge opened a new issue, #66: Register parquet not working with parquet file from pandas

Reply via email to