GitHub user hyunsik opened a pull request:
https://github.com/apache/tajo/pull/422
TAJO-1359: Add nested field projector and language extension to project
nested record.
This work is still on going. Actually, all features look well. But, I need
to clean up and refactor the changes, and I'll add more unite tests to verify
more use cases of complex types.
For tests, I'm using a twitter example data at
https://dev.twitter.com/rest/reference/get/search/tweets. Projecting all fields
including nested primitive fields work well. I'm going to add deeper nested
schema and complex cases.
Here is an example DDL and a query statement.
*DDL*
```sql
CREATE EXTERNAL TABLE tweets (
coordinates TEXT,
favorited BOOL,
truncated BOOL,
created_at TIMESTAMP,
id_str TEXT,
/*entrities RECORD (
urls ARRAY<TEXT>
)*/
in_reply_to_user_id_str TEXT,
contributors TEXT,
text TEXT,
metadata RECORD (
iso_language_code TEXT,
result_type TEXT
),
retweet_count INTEGER,
in_reply_to_status_id_str TEXT,
id TEXT,
geo TEXT,
retweeted BOOL,
in_reply_to_user_id TEXT,
place TEXT,
user RECORD (
profile_sidebar_fill_color TEXT,
profile_sidebar_border_color TEXT,
profile_background_tile TEXT,
name TEXT,
profile_image_url TEXT,
created_at TIMESTAMP,
location TEXT,
follow_request_sent TEXT,
profile_link_color TEXT,
is_translator BOOL,
id_str TEXT,
/*
entities RECORD (
url RECORD (
),
description RECORD (
)
), */
default_profile BOOL,
contributors_enabled BOOL,
favourites_count INTEGER,
url TEXT,
profile_image_url_https TEXT,
utc_offset INTEGER,
id BIGINT,
profile_use_background_image BOOL,
listed_count INTEGER,
profile_text_color TEXT,
lang TEXT,
followers_count INTEGER,
protected BOOL,
notifications TEXT,
profile_background_image_url_https TEXT,
profile_background_color TEXT,
verified TEXT,
geo_enabled TEXT,
time_zone TEXT,
description TEXT,
default_profile_image TEXT,
profile_background_image_url TEXT,
statuses_count INTEGER,
friends_count INTEGER,
following TEXT,
show_all_inline_media BOOL,
screen_name TEXT
),
in_reply_to_screen_name TEXT,
source TEXT,
in_reply_to_status_id TEXT
) USING JSON LOCATION ${table.path};
```
*DML*
```sql
SELECT
coordinates,
favorited,
truncated,
created_at,
id_str,
in_reply_to_user_id_str,
contributors,
"text",
metadata.iso_language_code,
metadata.result_type,
retweet_count,
in_reply_to_status_id_str,
id,
geo,
retweeted,
in_reply_to_user_id,
place,
user.profile_sidebar_fill_color,
user.profile_sidebar_border_color,
user.profile_background_tile,
user.name,
user.profile_image_url,
user.created_at,
user.location,
user.follow_request_sent,
user.profile_link_color,
user.is_translator,
user.id_str,
user.default_profile,
user.contributors_enabled,
user.favourites_count,
user.url,
user.profile_image_url_https,
user.utc_offset,
user.id,
user.profile_use_background_image,
user.listed_count,
user.profile_text_color,
user.lang,
user.followers_count,
user.protected,
user.notifications,
user.profile_background_image_url_https,
user.profile_background_color,
user.verified,
user.geo_enabled,
user.time_zone,
user.description,
user.default_profile_image,
user.profile_background_image_url,
user.statuses_count,
user.friends_count,
user.following,
user.show_all_inline_media,
user.screen_name,
in_reply_to_screen_name,
source,
in_reply_to_status_id
FROM
tweets;
```
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/hyunsik/tajo TAJO-1359
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/tajo/pull/422.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #422
----
commit ea6aeb38075fb3684ec33dacdf6d7769ef6c7e5f
Author: Hyunsik Choi <[email protected]>
Date: 2015-02-21T01:56:06Z
TAJO-1353: CREATE TABLE should support the nested record definition.
commit e3ad19f461739d5cc20b7d0b0cfac0f7f16ee2e5
Author: Hyunsik Choi <[email protected]>
Date: 2015-02-21T02:27:06Z
Changed the type name 'struct' to 'record'.
commit bb39a913b8ac8230eb97d65858cf93f95ffd68fc
Author: Hyunsik Choi <[email protected]>
Date: 2015-02-21T07:47:33Z
Introduced TypeDesc which contains DataType and NestedSchema.
commit a21c110c6a48f129d9c56c8680db5eea593d9076
Author: Hyunsik Choi <[email protected]>
Date: 2015-02-22T23:48:15Z
Introduced nested record to schema almostly.
commit 1c6b21d5647d5b9e62ff7287df46fe1bd8048661
Author: Hyunsik Choi <[email protected]>
Date: 2015-02-27T05:15:13Z
Merge branch 'master' of https://git-wip-us.apache.org/repos/asf/tajo into
TAJO-1329
Conflicts:
tajo-algebra/src/main/java/org/apache/tajo/algebra/ColumnDefinition.java
tajo-algebra/src/main/java/org/apache/tajo/algebra/DataTypeExpr.java
commit 1c8d2585c1a13f16fd20acc5076b9fb1f734c91c
Author: Hyunsik Choi <[email protected]>
Date: 2015-02-27T12:10:05Z
TAJO-1329: Improve Schema class to support nested struct support.
commit 0ef60183c48f77bd03ac3eb9ef80d62dbe66bf0e
Author: Hyunsik Choi <[email protected]>
Date: 2015-02-27T19:52:32Z
Change child_fields_num to nested_field_num.
* Add more comments.
* Clean up some codes.
commit 458ed0e5521101362d54c395242c9892374abfe7
Author: Hyunsik Choi <[email protected]>
Date: 2015-02-28T23:47:33Z
Refactor resolver and its releated things:
* Refactor TUtil::collectionToString and TUtil::arrayToString to
StringUtils::join.
* Add SQLAnalyzer::visitColumn_reference to support dotted-chained
identifier.
* Clean up NameResolver.
commit 66693e4d842d9e82a4161da62989fb10b9cd9fc9
Author: Hyunsik Choi <[email protected]>
Date: 2015-03-05T20:22:55Z
Merge branch 'master' of https://git-wip-us.apache.org/repos/asf/tajo into
TAJO-1329
commit a2d07415faa224849786954eeb47ebc7c3d90d36
Author: Hyunsik Choi <[email protected]>
Date: 2015-03-05T21:08:10Z
Allow catalog stores to have duplicated names across the different levels.
commit 8c000ce9ee53c5547f80f6ad412b5f2a560e1e37
Author: Hyunsik Choi <[email protected]>
Date: 2015-03-06T06:45:31Z
Fixed unique key of derby.
commit 4ca4e0e0884c5aa51155a8a5a52b92783cc755b8
Author: Hyunsik Choi <[email protected]>
Date: 2015-03-11T09:15:14Z
In progress.
commit 50076d89b4902650136a6b6c5a4a256b12d6a7bb
Author: Hyunsik Choi <[email protected]>
Date: 2015-03-11T09:15:23Z
Merge branch 'master' of https://git-wip-us.apache.org/repos/asf/tajo into
TAJO-1359
commit 38c61e14cb8fde17cb9cd1c96fe6112ecaf8f9ac
Author: Hyunsik Choi <[email protected]>
Date: 2015-03-11T09:16:04Z
Merge branch 'master' of https://git-wip-us.apache.org/repos/asf/tajo into
TAJO-1329
commit f5fdb02c9a1b0df9cddb5f783c21815d6522ec26
Author: Hyunsik Choi <[email protected]>
Date: 2015-03-11T10:03:11Z
Add missed nested field support.
* Improve tajo-dump to support nested schema
* enable \d command to show nested schema
commit c2521114e0efb33cdb78ab92b045f519612e92d2
Author: Hyunsik Choi <[email protected]>
Date: 2015-03-11T10:04:40Z
Merge branch 'TAJO-1329' of github.com:hyunsik/tajo into TAJO-1329
commit cc5ab77a9b07f53b265b25fddbf30e4d5ec08c81
Author: Hyunsik Choi <[email protected]>
Date: 2015-03-11T10:40:55Z
Merge branch 'TAJO-1329' of github.com:hyunsik/tajo into TAJO-1359
commit 8b3f45ed8aac792fc79de3a86a529ac4c1e8ee91
Author: Hyunsik Choi <[email protected]>
Date: 2015-03-12T01:27:05Z
Merge branch 'master' of https://git-wip-us.apache.org/repos/asf/tajo into
TAJO-1359
commit f10f752ebebe88eaf1137b7de9241126786a03d3
Author: Hyunsik Choi <[email protected]>
Date: 2015-03-12T09:16:17Z
Merge branch 'master' of https://git-wip-us.apache.org/repos/asf/tajo into
TAJO-1329
commit b271d4d8946a6db26a25c83db1c8893d4ee24746
Author: Hyunsik Choi <[email protected]>
Date: 2015-03-12T19:17:44Z
Update schema in xml files.
commit cfa95ebd5bbae96d3f0b593283e4ca81518ee31b
Author: Hyunsik Choi <[email protected]>
Date: 2015-03-12T19:28:43Z
Updated catalog store driver versions.
commit 6b4e38aea84bba893c34b031312232ad56603d2b
Author: Hyunsik Choi <[email protected]>
Date: 2015-03-12T19:30:10Z
Merge branch 'TAJO-1329' of github.com:hyunsik/tajo into TAJO-1359
commit 350621221e20d5237c15adb0f135b92753a80cb2
Author: Hyunsik Choi <[email protected]>
Date: 2015-03-13T00:20:35Z
Merge branch 'master' of https://git-wip-us.apache.org/repos/asf/tajo into
TAJo-1329
Conflicts:
tajo-catalog/tajo-catalog-server/src/main/resources/schemas/oracle/columns.sql
tajo-catalog/tajo-catalog-server/src/main/resources/schemas/postgresql/columns.sql
commit f3cf891791fef92d2872746c22f9f64c51df64ba
Author: Hyunsik Choi <[email protected]>
Date: 2015-03-13T01:27:58Z
Fixed the bug about the duplicated names in different level.
commit 2567c32f9e05479b21cbf48f52f94d4df8728ffc
Author: Hyunsik Choi <[email protected]>
Date: 2015-03-13T02:19:48Z
Merge branch 'master' of https://git-wip-us.apache.org/repos/asf/tajo into
TAJO-1359
Conflicts:
tajo-catalog/tajo-catalog-server/src/main/resources/schemas/oracle/columns.sql
tajo-catalog/tajo-catalog-server/src/main/resources/schemas/postgresql/columns.sql
commit 04bb0e1e1c8e58f86e923f768b711e1f3bebfe1a
Author: Hyunsik Choi <[email protected]>
Date: 2015-03-13T02:20:18Z
Merge branch 'TAJO-1329' of github.com:hyunsik/tajo into TAJO-1359
Conflicts:
tajo-catalog/tajo-catalog-server/src/main/resources/schemas/oracle/columns.sql
tajo-catalog/tajo-catalog-server/src/main/resources/schemas/postgresql/columns.sql
commit 6a6eb1828383424df569b6b54cd27df0137dce97
Author: Hyunsik Choi <[email protected]>
Date: 2015-03-13T06:58:59Z
Merge branch 'master' of https://git-wip-us.apache.org/repos/asf/tajo into
TAJO-1359
commit 899deead496363ddcca217bacc16f9490cf3271c
Author: Hyunsik Choi <[email protected]>
Date: 2015-03-14T02:22:21Z
Completed complex type support.
* Rename Schema::getColumns() to getRootColumns.
* Add Schema::getAllColumns to return all columns flatten.
* Change the behavior of projectable scanner.
* Improved DelimitedTextLine scanner to return only compact projected
fields.
commit ed3b54ab686a6d47474acf22a0e74e413e7c397b
Author: Hyunsik Choi <[email protected]>
Date: 2015-03-14T02:24:57Z
Merge branch 'master' of https://git-wip-us.apache.org/repos/asf/tajo into
TAJO-1359
Conflicts:
tajo-core/src/main/java/org/apache/tajo/engine/planner/physical/SeqScanExec.java
commit d2bef28361817e59bd5cbcbfcfeda1c8b1a30fea
Author: Hyunsik Choi <[email protected]>
Date: 2015-03-14T06:58:00Z
Added twitter json example tests.
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---