[GitHub] [arrow] jorisvandenbossche commented on pull request #7691: ARROW-8655: [Python][Dataset] Provide helper method to deconstruct a partition expression

GitBox Thu, 09 Jul 2020 06:57:18 -0700


jorisvandenbossche commented on pull request #7691:
URL: https://github.com/apache/arrow/pull/7691#issuecomment-656143156



   Not the cleanest solution, but could do this relatively quickly because it's 
based on what I did earlier in https://github.com/apache/arrow/pull/7523. But I 
think a more proper solution won't be possible before 1.0, and this at least 
gives a way to get the information needed.
   
   A few examples:
   
   ```python
   In [1]: import pyarrow.dataset as ds                                         
                                                                                
                                                      
   
   In [2]: dataset = ds.dataset("test_filter_fragments_pandas/", 
format="parquet", partitioning="hive")                                          
                                                                     
   In [4]: expr = list(dataset.get_fragments())[0].partition_expression         
                                                                                
                                                      
   
   # single partition level with a string
   In [5]: expr                                                                 
                                                                                
                                                      
   Out[5]: <pyarrow.dataset.Expression (part == A:string)>
   
   In [6]: ds._unwrap_partition_expression(expr)                                
                                                                                
                                                      
   Out[6]: [('part', 'A')]
   
   
   In [7]: dataset = ds.dataset("test_parquet_dask/", format="parquet", 
partitioning="hive")                                                            
                                                              
   In [8]: expr = list(dataset.get_fragments())[0].partition_expression         
                                                                                
                                                      
   
   # two partition levels with integers
   In [9]: expr                                                                 
                                                                                
                                                      
   Out[9]: <pyarrow.dataset.Expression ((year == 2016:int32) and (month == 
1:int32))>
   
   In [10]: ds._unwrap_partition_expression(expr)                               
                                                                                
                                                      
   Out[10]: [('year', 2016), ('month', 1)]
   
   
   In [11]: dataset = ds.dataset("test.parquet", format="parquet")              
                                                                                
                                                      
   In [12]: expr = list(dataset.get_fragments())[0].partition_expression        
                                                                                
                                                      
   
   # no partitioned dataset
   In [13]: expr                                                                
                                                                                
                                                      
   Out[13]: <pyarrow.dataset.Expression true:bool>
   
   In [14]: ds._unwrap_partition_expression(expr)                               
                                                                                
                                                      
   Out[14]: []
   ```
   
   cc @rjzamora 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] jorisvandenbossche commented on pull request #7691: ARROW-8655: [Python][Dataset] Provide helper method to deconstruct a partition expression

Reply via email to