[
https://issues.apache.org/jira/browse/DRILL-7927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Anton Waniek updated DRILL-7927:
--------------------------------
Attachment: drillErr.png
> NullPointerException when trying to write UNIONTYPE to Parquet
> --------------------------------------------------------------
>
> Key: DRILL-7927
> URL: https://issues.apache.org/jira/browse/DRILL-7927
> Project: Apache Drill
> Issue Type: Bug
> Affects Versions: 1.18.0
> Environment: *Docker:*
> Client: Docker Engine - Community
> Cloud integration: 1.0.14
> Version: 20.10.6
> API version: 1.41
> Go version: go1.13.15
> Git commit: 370c289
> Built: Fri Apr 9 22:46:45 2021
> OS/Arch: linux/amd64
> Context: default
> Experimental: true
> Server: Docker Engine - Community
> Engine:
> Version: 20.10.6
> API version: 1.41 (minimum version 1.12)
> Go version: go1.13.15
> Git commit: 8728dd2
> Built: Fri Apr 9 22:44:56 2021
> OS/Arch: linux/amd64
> Experimental: false
> containerd:
> Version: 1.4.4
> GitCommit: 05f951a3781f4f2c1911b05e61c160e9c30eaa8e
> runc:
> Version: 1.0.0-rc93
> GitCommit: 12644e614e25b05da6fd08a38ffa0cfe1903fdec
> docker-init:
> Version: 0.19.0
> GitCommit: de40ad0
> Running on Windows under WSL2.
> Reporter: Anton Waniek
> Priority: Minor
>
> The "union type" data type is not supported by the Parquet format and thus,
> Drill should handle the exception that occurs when the user attempts to write
> this type to parquet. A NullPointerException is currently thrown.
> There are a few steps necessary to reproduce this bug but the process is
> straightforward.
> To summarize the commands in advance: to have a table with columns using the
> UNION type, one must first enable the union type option, then run a query
> over a MongoDB collection with inhomogeneous types (e.g. strings and numbers)
> (*n.b.* there may be a simpler way to get hold of a union type table but I am
> not aware of it). One must then try to write the table to parquet.
> First start MongoDB and store appropriate data inside:
> {code:bash}
> docker run --rm -it -d --name mongo-uniontype mongo:4.4
> # wait for mongo a bit
> sleep 1
> create_coll='db.uniontype_table.insertMany([{"column": 1},{"column":
> "string"}])'
> docker exec -it mongo-uniontype mongo example --eval $create_coll
> # check the outcome
> docker exec -it mongo-uniontype mongo example --eval
> 'db.uniontype_table.find()'{code}
> Run Drill and configure the Mongo storage plugin:
> {code:bash}
> docker run --rm -it -d --name drill-uniontype -p 8047:8047 \
> apache/drill:latest /bin/bash
> mongo_ip=$(docker inspect -f
> '{{range.NetworkSettings.Networks}}{{.IPAddress}}{{end}}' mongo-uniontype)
> mongo_conf() {
> cat <<EOF
> {
> "name": "mongo",
> "config": {"type":"mongo", "connection":"mongodb://$mongo_ip:27017/",
> "enabled":"true"}
> }
> EOF
> }
> sleep 5 # wait a little for Drill
> curl -X POST -H "Content-Type: application/json" \
> http://localhost:8047/storage/mongo.json --data "$(mongo_conf)"{code}
> Finally, attach to the freshly configured Drill, set the relevant option, and
> run the query:
> {code:bash}
> docker attach drill-uniontype
> {code}
> then in the resulting *sqlline* command line:
> {code:java}
> use mongo.example;
> SET `exec.enable_union_type` = true;
> CREATE TABLE `dfs.tmp`.`problem_is_here.parquet` AS (SELECT * FROM
> `uniontype_table`);{code}
> And the last statement should raise the exception.
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)