lostmygithubaccount commented on issue #440:
URL:
https://github.com/apache/arrow-datafusion-python/issues/440#issuecomment-1656286966
just to throw out an idea related to this:
> I think a very reasonable alternative reality is that "datafusion-python"
remains a thin binding on top of datafusion, and the delightful user experience
comes via ibis.
if we agree Ibis is a delightful dataframe API and we can close the gaps in
the DataFusion backend, then you could avoid a lot of work in defining a new
dataframe API by wrapping Ibis so that code looks like:
```python
[ins] In [3]: t = datafusion.read_parquet("penguins.parquet")
[ins] In [4]: t
Out[4]:
DatabaseTable: _ibis_read_parquet_pnfkuttmizcmjk7trfkv5bhfse
species string
island string
bill_length_mm float64
bill_depth_mm float64
flipper_length_mm int64
body_mass_g int64
sex string
year int64
[ins] In [5]: datafusion.options.interactive = True
[ins] In [6]: t
Out[6]:
┏━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ species ┃ island ┃ bill_length_mm ┃ bill_depth_mm ┃ flipper_length_mm ┃
body_mass_g ┃ sex ┃ year ┃
┡━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ string │ string │ float64 │ float64 │ int64 │
int64 │ string │ int64 │
├─────────┼───────────┼────────────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┤
│ Adelie │ Torgersen │ 39.1 │ 18.7 │ 181 │
3750 │ male │ 2007 │
│ Adelie │ Torgersen │ 39.5 │ 17.4 │ 186 │
3800 │ female │ 2007 │
│ Adelie │ Torgersen │ 40.3 │ 18.0 │ 195 │
3250 │ female │ 2007 │
│ Adelie │ Torgersen │ nan │ nan │ NULL │
NULL │ NULL │ 2007 │
│ Adelie │ Torgersen │ 36.7 │ 19.3 │ 193 │
3450 │ female │ 2007 │
│ Adelie │ Torgersen │ 39.3 │ 20.6 │ 190 │
3650 │ male │ 2007 │
│ Adelie │ Torgersen │ 38.9 │ 17.8 │ 181 │
3625 │ female │ 2007 │
│ Adelie │ Torgersen │ 39.2 │ 19.6 │ 195 │
4675 │ male │ 2007 │
│ Adelie │ Torgersen │ 34.1 │ 18.1 │ 193 │
3475 │ NULL │ 2007 │
│ Adelie │ Torgersen │ 42.0 │ 20.2 │ 190 │
4250 │ NULL │ 2007 │
│ … │ … │ … │ … │ … │
… │ … │ … │
└─────────┴───────────┴────────────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┘
[ins] In [7]: t.group_by(["species", "island"]).agg(datafusion._.count())
Out[7]:
┏━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ species ┃ island ┃
CountStar(_ibis_read_parquet_pnfkuttmizcmjk7trfkv5bhfse) ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ string │ string │ int64
│
├───────────┼───────────┼──────────────────────────────────────────────────────────┤
│ Adelie │ Biscoe │
44 │
│ Adelie │ Torgersen │
52 │
│ Adelie │ Dream │
56 │
│ Chinstrap │ Dream │
68 │
│ Gentoo │ Biscoe │
124 │
└───────────┴───────────┴──────────────────────────────────────────────────────────┘
[ins] In [8]: t.group_by(["species",
"island"]).agg(datafusion._.count().name("count"))
Out[8]:
┏━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━┓
┃ species ┃ island ┃ count ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━┩
│ string │ string │ int64 │
├───────────┼───────────┼───────┤
│ Adelie │ Biscoe │ 44 │
│ Adelie │ Torgersen │ 52 │
│ Chinstrap │ Dream │ 68 │
│ Gentoo │ Biscoe │ 124 │
│ Adelie │ Dream │ 56 │
└───────────┴───────────┴───────┘
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]