lostmygithubaccount commented on issue #440:
URL: 
https://github.com/apache/arrow-datafusion-python/issues/440#issuecomment-1656286966

   just to throw out an idea related to this:
   
   > I think a very reasonable alternative reality is that "datafusion-python" 
remains a thin binding on top of datafusion, and the delightful user experience 
comes via ibis.
   
   if we agree Ibis is a delightful dataframe API and we can close the gaps in 
the DataFusion backend, then you could avoid a lot of work in defining a new 
dataframe API by wrapping Ibis so that code looks like:
   
   ```python
   [ins] In [3]: t = datafusion.read_parquet("penguins.parquet")
   
   [ins] In [4]: t
   Out[4]:
   DatabaseTable: _ibis_read_parquet_pnfkuttmizcmjk7trfkv5bhfse
     species           string
     island            string
     bill_length_mm    float64
     bill_depth_mm     float64
     flipper_length_mm int64
     body_mass_g       int64
     sex               string
     year              int64
   
   [ins] In [5]: datafusion.options.interactive = True
   
   [ins] In [6]: t
   Out[6]:
   
┏━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┓
   ┃ species ┃ island    ┃ bill_length_mm ┃ bill_depth_mm ┃ flipper_length_mm ┃ 
body_mass_g ┃ sex    ┃ year  ┃
   
┡━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━┩
   │ string  │ string    │ float64        │ float64       │ int64             │ 
int64       │ string │ int64 │
   
├─────────┼───────────┼────────────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┤
   │ Adelie  │ Torgersen │           39.1 │          18.7 │               181 │ 
       3750 │ male   │  2007 │
   │ Adelie  │ Torgersen │           39.5 │          17.4 │               186 │ 
       3800 │ female │  2007 │
   │ Adelie  │ Torgersen │           40.3 │          18.0 │               195 │ 
       3250 │ female │  2007 │
   │ Adelie  │ Torgersen │            nan │           nan │              NULL │ 
       NULL │ NULL   │  2007 │
   │ Adelie  │ Torgersen │           36.7 │          19.3 │               193 │ 
       3450 │ female │  2007 │
   │ Adelie  │ Torgersen │           39.3 │          20.6 │               190 │ 
       3650 │ male   │  2007 │
   │ Adelie  │ Torgersen │           38.9 │          17.8 │               181 │ 
       3625 │ female │  2007 │
   │ Adelie  │ Torgersen │           39.2 │          19.6 │               195 │ 
       4675 │ male   │  2007 │
   │ Adelie  │ Torgersen │           34.1 │          18.1 │               193 │ 
       3475 │ NULL   │  2007 │
   │ Adelie  │ Torgersen │           42.0 │          20.2 │               190 │ 
       4250 │ NULL   │  2007 │
   │ …       │ …         │              … │             … │                 … │ 
          … │ …      │     … │
   
└─────────┴───────────┴────────────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┘
   
   [ins] In [7]: t.group_by(["species", "island"]).agg(datafusion._.count())
   Out[7]:
   
┏━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
   ┃ species   ┃ island    ┃ 
CountStar(_ibis_read_parquet_pnfkuttmizcmjk7trfkv5bhfse) ┃
   
┡━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
   │ string    │ string    │ int64                                              
      │
   
├───────────┼───────────┼──────────────────────────────────────────────────────────┤
   │ Adelie    │ Biscoe    │                                                    
   44 │
   │ Adelie    │ Torgersen │                                                    
   52 │
   │ Adelie    │ Dream     │                                                    
   56 │
   │ Chinstrap │ Dream     │                                                    
   68 │
   │ Gentoo    │ Biscoe    │                                                    
  124 │
   
└───────────┴───────────┴──────────────────────────────────────────────────────────┘
   
   [ins] In [8]: t.group_by(["species", 
"island"]).agg(datafusion._.count().name("count"))
   Out[8]:
   ┏━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━┓
   ┃ species   ┃ island    ┃ count ┃
   ┡━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━┩
   │ string    │ string    │ int64 │
   ├───────────┼───────────┼───────┤
   │ Adelie    │ Biscoe    │    44 │
   │ Adelie    │ Torgersen │    52 │
   │ Chinstrap │ Dream     │    68 │
   │ Gentoo    │ Biscoe    │   124 │
   │ Adelie    │ Dream     │    56 │
   └───────────┴───────────┴───────┘
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to