marvin-lge opened a new issue, #66:
URL: https://github.com/apache/arrow-datafusion-python/issues/66
**Bug descripion**
From pandas I am writing a parquet file (using gzip compression), I am
looking to query this file using datafusion. The same file exported as .csv is
working fine using this library, but the parquet version is not returning
anything.
**Steps to reproduce **
```python
import datafusion
import pandas as pd
df = pd.DataFrame(data={'col1': [1, 2, 4, 5, 6], 'col2': [3, 4, 3, 5, 2],
'col3': [3, 4, 1, 2, 3], 'col4': [3, 4, 4, 5, 6]})
df.to_csv('df.csv', compression=None)
df.to_parquet('df.pq', compression=None)
ctx = datafusion.SessionContext()
ctx.register_csv(name="example_csv", path="df.csv")
ctx.register_parquet(name="example_pq", path="df.pq")
# test csv
df = ctx.sql("SELECT * FROM example_csv")
result = df.collect()
res = result[0]
# test parquet
df = ctx.sql("SELECT * FROM example_pq")
result = df.collect()
res = result[0]
```
**Expected behavior**
The same result from both approaches
**Additional context**
It also seems that gzip compressed files are not working, this is probably a
limitation of the core datafusion library?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]