[ 
https://issues.apache.org/jira/browse/DRILL-7927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anton Waniek updated DRILL-7927:
--------------------------------
    Attachment: drillErr.png

> NullPointerException when trying to write UNIONTYPE to Parquet
> --------------------------------------------------------------
>
>                 Key: DRILL-7927
>                 URL: https://issues.apache.org/jira/browse/DRILL-7927
>             Project: Apache Drill
>          Issue Type: Bug
>    Affects Versions: 1.18.0
>         Environment: *Docker:*
> Client: Docker Engine - Community
>  Cloud integration: 1.0.14
>  Version: 20.10.6
>  API version: 1.41
>  Go version: go1.13.15
>  Git commit: 370c289
>  Built: Fri Apr 9 22:46:45 2021
>  OS/Arch: linux/amd64
>  Context: default
>  Experimental: true
> Server: Docker Engine - Community
>  Engine:
>  Version: 20.10.6
>  API version: 1.41 (minimum version 1.12)
>  Go version: go1.13.15
>  Git commit: 8728dd2
>  Built: Fri Apr 9 22:44:56 2021
>  OS/Arch: linux/amd64
>  Experimental: false
>  containerd:
>  Version: 1.4.4
>  GitCommit: 05f951a3781f4f2c1911b05e61c160e9c30eaa8e
>  runc:
>  Version: 1.0.0-rc93
>  GitCommit: 12644e614e25b05da6fd08a38ffa0cfe1903fdec
>  docker-init:
>  Version: 0.19.0
>  GitCommit: de40ad0
> Running on Windows under WSL2.
>            Reporter: Anton Waniek
>            Priority: Minor
>
> The "union type" data type is not supported by the Parquet format and thus, 
> Drill should handle the exception that occurs when the user attempts to write 
> this type to parquet. A NullPointerException is currently thrown.
> There are a few steps necessary to reproduce this bug but the process is 
> straightforward.
> To summarize the commands in advance: to have a table with columns using the 
> UNION type, one must first enable the union type option, then run a query 
> over a MongoDB collection with inhomogeneous types (e.g. strings and numbers) 
> (*n.b.* there may be a simpler way to get hold of a union type table but I am 
> not aware of it). One must then try to write the table to parquet. 
> First start MongoDB and store appropriate data inside:
> {code:bash}
> docker run --rm -it -d --name mongo-uniontype mongo:4.4
> # wait for mongo a bit
> sleep 1
> create_coll='db.uniontype_table.insertMany([{"column": 1},{"column": 
> "string"}])'
> docker exec -it mongo-uniontype mongo example --eval $create_coll
> # check the outcome
> docker exec -it mongo-uniontype mongo example --eval 
> 'db.uniontype_table.find()'{code}
> Run Drill and configure the Mongo storage plugin:
> {code:bash}
> docker run --rm -it -d --name drill-uniontype -p 8047:8047 \
>   apache/drill:latest /bin/bash
> mongo_ip=$(docker inspect -f 
> '{{range.NetworkSettings.Networks}}{{.IPAddress}}{{end}}' mongo-uniontype)
> mongo_conf() {
> cat <<EOF
> {
>   "name": "mongo",
>   "config": {"type":"mongo", "connection":"mongodb://$mongo_ip:27017/", 
> "enabled":"true"}
> }
> EOF
> }
> sleep 5  # wait a little for Drill
> curl -X POST -H "Content-Type: application/json" \
>   http://localhost:8047/storage/mongo.json --data "$(mongo_conf)"{code}
> Finally, attach to the freshly configured Drill, set the relevant option, and 
> run the query:
> {code:bash}
> docker attach drill-uniontype
> {code}
> then in the resulting *sqlline* command line:
> {code:java}
> use mongo.example;
> SET `exec.enable_union_type` = true;
> CREATE TABLE `dfs.tmp`.`problem_is_here.parquet` AS (SELECT * FROM 
> `uniontype_table`);{code}
> And the last statement should raise the exception.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to