vincev commented on issue #212:
URL:
https://github.com/apache/arrow-datafusion/issues/212#issuecomment-1427054023
I have got a version of the SQL query working in #5214:
```sql
DataFusion CLI v18.0.0
❯ create external table shapes stored as parquet location 'nested.parquet';
0 rows in set. Query took 0.007 seconds.
❯ select count(*) from shapes;
+-----------------+
| COUNT(UInt8(1)) |
+-----------------+
| 100000 |
+-----------------+
1 row in set. Query took 0.009 seconds.
❯ select * from shapes limit 10;
+----------+------------------------------------------------------------+--------------------------+
| shape_id | points |
tags |
+----------+------------------------------------------------------------+--------------------------+
| 1 | [{"x": -3, "y": -4}, {"x": -3, "y": 6}, {"x": 2, "y": -2}] |
[tag1] |
| 2 | [{"x": -9, "y": 2}, {"x": -10, "y": -4}] |
|
| 3 | [{"x": -3, "y": 5}, {"x": 2, "y": -1}] |
[tag4, tag7, tag3] |
| 4 | [{"x": -2, "y": -10}, {"x": 6, "y": -5}, {"x": 0, "y": 6}] |
[tag4] |
| 5 | [{"x": -7, "y": -6}, {"x": -10, "y": 7}] |
[tag5, tag9, tag6] |
| 6 | |
[tag1, tag6, tag6] |
| 7 | [{"x": -9, "y": -1}, {"x": -1, "y": -3}] |
[tag5, tag3] |
| 8 | [{"x": 8, "y": -7}, {"x": -1, "y": -1}] |
[tag1, tag7, tag4] |
| 9 | |
[tag4, tag1, tag9, tag4] |
| 10 | |
[tag2, tag1] |
+----------+------------------------------------------------------------+--------------------------+
10 rows in set. Query took 0.008 seconds.
❯ select unnest(tags) from shapes limit 10;
+------+
| tags |
+------+
| tag1 |
| |
| tag4 |
| tag7 |
| tag3 |
| tag4 |
| tag5 |
| tag9 |
| tag6 |
| tag1 |
+------+
10 rows in set. Query took 0.034 seconds.
❯ select unnest(points) from shapes limit 10;
+---------------------+
| points |
+---------------------+
| {"x": -3, "y": -4} |
| {"x": -3, "y": 6} |
| {"x": 2, "y": -2} |
| {"x": -9, "y": 2} |
| {"x": -10, "y": -4} |
| {"x": -3, "y": 5} |
| {"x": 2, "y": -1} |
| {"x": -2, "y": -10} |
| {"x": 6, "y": -5} |
| {"x": 0, "y": 6} |
+---------------------+
10 rows in set. Query took 0.060 seconds.
❯ select count(tags) from shapes;
+--------------------+
| COUNT(shapes.tags) |
+--------------------+
| 80273 |
+--------------------+
1 row in set. Query took 0.022 seconds.
❯ select count(unnest(tags)) from shapes;
+--------------------+
| COUNT(shapes.tags) |
+--------------------+
| 200863 |
+--------------------+
1 row in set. Query took 0.036 seconds.
❯ select count(distinct(unnest(tags))) from shapes;
+-----------------------------+
| COUNT(DISTINCT shapes.tags) |
+-----------------------------+
| 9 |
+-----------------------------+
1 row in set. Query took 0.038 seconds.
❯ select shape_id, unnest(tags), unnest(points) from shapes where shape_id <
10 limit 10;
+----------+------+---------------------+
| shape_id | tags | points |
+----------+------+---------------------+
| 1 | tag1 | {"x": -3, "y": -4} |
| 1 | tag1 | {"x": -3, "y": 6} |
| 1 | tag1 | {"x": 2, "y": -2} |
| 2 | | {"x": -9, "y": 2} |
| 2 | | {"x": -10, "y": -4} |
| 3 | tag4 | {"x": -3, "y": 5} |
| 3 | tag4 | {"x": 2, "y": -1} |
| 3 | tag7 | {"x": -3, "y": 5} |
| 3 | tag7 | {"x": 2, "y": -1} |
| 3 | tag3 | {"x": -3, "y": 5} |
+----------+------+---------------------+
10 rows in set. Query took 0.045 seconds.
```
I am not sure if the implementation is okay though as it parses the SQL AST
to add the unnesting in the plan.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]