vincev opened a new pull request, #5106:
URL: https://github.com/apache/arrow-datafusion/pull/5106

   # Which issue does this PR close?
   
   Work for #212
   
   # Rationale for this change
   
   This PR add an `unnest_column` method to `DataFrame` to unnest list types 
columns (see 
[tests](https://github.com/vincev/arrow-datafusion/blob/8a7059cefa2a02a8418a704b8d6ff08aead06fbf/datafusion/core/tests/dataframe.rs#L511)),
 given the following data frame:
   
   ```
   
"+----------+------------------------------------------------------------+--------------------+"
   "| shape_id | points                                                     | 
tags               |"
   
"+----------+------------------------------------------------------------+--------------------+"
   "| 1        | [{"x": -3, "y": -4}, {"x": -3, "y": 6}, {"x": 2, "y": -2}] | 
[tag1]             |"
   "| 2        |                                                            | 
[tag1, tag2]       |"
   "| 3        | [{"x": -9, "y": 2}, {"x": -10, "y": -4}]                   |   
                 |"
   "| 4        | [{"x": -3, "y": 5}, {"x": 2, "y": -1}]                     | 
[tag1, tag2, tag3] |"
   
"+----------+------------------------------------------------------------+--------------------+"
   ```
   
   The call `df.unnest_column("tags")` produces:
   
   ```
   
+----------+------------------------------------------------------------+------+
   | shape_id | points                                                     | 
tags |
   
+----------+------------------------------------------------------------+------+
   | 1        | [{"x": -3, "y": -4}, {"x": -3, "y": 6}, {"x": 2, "y": -2}] | 
tag1 |
   | 2        |                                                            | 
tag1 |
   | 2        |                                                            | 
tag2 |
   | 3        | [{"x": -9, "y": 2}, {"x": -10, "y": -4}]                   |    
  |
   | 4        | [{"x": -3, "y": 5}, {"x": 2, "y": -1}]                     | 
tag1 |
   | 4        | [{"x": -3, "y": 5}, {"x": 2, "y": -1}]                     | 
tag2 |
   | 4        | [{"x": -3, "y": 5}, {"x": 2, "y": -1}]                     | 
tag3 |
   
+----------+------------------------------------------------------------+------+
   ```
   
   calling `df.unnest_column("points")` produces:
   
   ```
   +----------+---------------------+--------------------+
   | shape_id | points              | tags               |
   +----------+---------------------+--------------------+
   | 1        | {"x": -3, "y": -4}  | [tag1]             |
   | 1        | {"x": -3, "y": 6}   | [tag1]             |
   | 1        | {"x": 2, "y": -2}   | [tag1]             |
   | 2        |                     | [tag1, tag2]       |
   | 3        | {"x": -9, "y": 2}   |                    |
   | 3        | {"x": -10, "y": -4} |                    |
   | 4        | {"x": -3, "y": 5}   | [tag1, tag2, tag3] |
   | 4        | {"x": 2, "y": -1}   | [tag1, tag2, tag3] |
   +----------+---------------------+--------------------+
   ```
   
   and calling `df.unnest_column("points").unnest_column("tags")` produces:
   
   ```
   +----------+---------------------+------+
   | shape_id | points              | tags |
   +----------+---------------------+------+
   | 1        | {"x": -3, "y": -4}  | tag1 |
   | 1        | {"x": -3, "y": 6}   | tag1 |
   | 1        | {"x": 2, "y": -2}   | tag1 |
   | 2        |                     | tag1 |
   | 2        |                     | tag2 |
   | 3        | {"x": -9, "y": 2}   |      |
   | 3        | {"x": -10, "y": -4} |      |
   | 4        | {"x": -3, "y": 5}   | tag1 |
   | 4        | {"x": -3, "y": 5}   | tag2 |
   | 4        | {"x": -3, "y": 5}   | tag3 |
   | 4        | {"x": 2, "y": -1}   | tag1 |
   | 4        | {"x": 2, "y": -1}   | tag2 |
   | 4        | {"x": 2, "y": -1}   | tag3 |
   +----------+---------------------+------+
   ```
   
   # What changes are included in this PR?
   
   This PR add the following changes:
   - Add `unnest_method` to `DataFrame`
   - Add an `Unnest` variant to `LogicalPlan` that produces a new schema for 
the unnested column
   - Add `UnnestExec` to the execution plan
   - Add some tests to `DataFrame` 
   
   # Are these changes tested?
   
   Added some initial tests 
[here](https://github.com/vincev/arrow-datafusion/blob/8a7059cefa2a02a8418a704b8d6ff08aead06fbf/datafusion/core/tests/dataframe.rs#L511),
 I am happy to add more tests following feedback.
   
   # Are there any user-facing changes?
   
   Add an `unnest_column` method to `DataFrame`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to