[ 
https://issues.apache.org/jira/browse/DRILL-7927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anton Waniek updated DRILL-7927:
--------------------------------
    Description: 
The "union type" data type is not supported by the Parquet format and thus, 
Drill should handle the exception that occurs when the user attempts to write 
this type to parquet. A NullPointerException is currently thrown.

There are a few steps necessary to reproduce this bug but the process is 
straightforward.

To summarize the commands in advance: to have a table with columns using the 
UNION type, one must first enable the union type option, then run a query over 
a MongoDB collection with inhomogeneous types (e.g. strings and numbers) 
(*n.b.* there may be a simpler way to get hold of a union type table but I am 
not aware of it). One must then try to write the table to parquet. 

First start MongoDB and store appropriate data inside:
{code:bash}
docker run --rm -it -d --name mongo-uniontype mongo:4.4
# wait for mongo a bit
sleep 1
create_coll='db.uniontype_table.insertMany([{"column": 1},{"column": 
"string"}])'
docker exec -it mongo-uniontype mongo example --eval $create_coll

# check the outcome
docker exec -it mongo-uniontype mongo example --eval 
'db.uniontype_table.find()'{code}
Run Drill and configure the Mongo storage plugin:
{code:bash}
docker run --rm -it -d --name drill-uniontype -p 8047:8047 \
  apache/drill:latest /bin/bash
mongo_ip=$(docker inspect -f 
'{{range.NetworkSettings.Networks}}{{.IPAddress}}{{end}}' mongo-uniontype)
mongo_conf() {
cat <<EOF
{
  "name": "mongo",
  "config": {"type":"mongo", "connection":"mongodb://$mongo_ip:27017/", 
"enabled":"true"}
}
EOF
}
sleep 5  # wait a little for Drill
curl -X POST -H "Content-Type: application/json" \
  http://localhost:8047/storage/mongo.json --data "$(mongo_conf)"{code}
Finally, attach to the freshly configured Drill, set the relevant option, and 
run the query:
{code:bash}
docker attach drill-uniontype
{code}
then in the resulting *sqlline* command line:
{code:java}
use mongo.example;
SET `exec.enable_union_type` = true;
CREATE TABLE `dfs.tmp`.`not_gonna_work.parquet` AS (SELECT * FROM 
`uniontype_table`);{code}
And the last statement should raise the exception.

!image-2021-05-14-13-54-16-788.png!

 

  was:
The "union type" data type is not supported by the Parquet format and thus, 
Drill should handle the exception that occurs when the user attempts to write 
this type to parquet. A NullPointerException is currently thrown.

There are a few steps necessary to reproduce this bug but the process is 
straightforward.

To summarize the commands in advance: to have a table with columns using the 
UNION type, one must first enable the union type option, then run a query over 
a MongoDB collection with inhomogeneous types (e.g. strings and numbers) 
(*n.b.* there may be a simpler way to get hold of a union type table but I am 
not aware of it). One must then try to write the table to parquet. 

First start MongoDB and store appropriate data inside:
{code:bash}
docker run --rm -it -d --name mongo-uniontype mongo:4.4
# wait for mongo a bit
sleep 1
create_coll='db.uniontype_table.insertMany([{"column": 1},{"column": 
"string"}])'
docker exec -it mongo-uniontype mongo example --eval $create_coll

# check the outcome
docker exec -it mongo-uniontype mongo example --eval 
'db.uniontype_table.find()'{code}
Run Drill and configure the Mongo storage plugin:
{code:bash}
docker run --rm -it -d --name drill-uniontype -p 8047:8047 \
  apache/drill:latest /bin/bash
mongo_ip=$(docker inspect -f 
'{{range.NetworkSettings.Networks}}{{.IPAddress}}{{end}}' mongo-uniontype)
mongo_conf() {
cat <<EOF
{
  "name": "mongo",
  "config": {"type":"mongo", "connection":"mongodb://$mongo_ip:27017/", 
"enabled":"true"}
}
EOF
}
sleep 5  # wait a little for Drill
curl -X POST -H "Content-Type: application/json" \
  http://localhost:8047/storage/mongo.json --data "$(mongo_conf)"{code}
Finally, attach to the freshly configured Drill, set the relevant option, and 
run the query:
{code:bash}
docker attach drill-uniontype
{code}
then in the resulting *sqlline* command line:
{code:java}
use mongo.example;
SET `exec.enable_union_type` = true;
CREATE TABLE `dfs.tmp`.`problem_is_here.parquet` AS (SELECT * FROM 
`uniontype_table`);{code}
And the last statement should raise the exception.

 


> NullPointerException when trying to write UNIONTYPE to Parquet
> --------------------------------------------------------------
>
>                 Key: DRILL-7927
>                 URL: https://issues.apache.org/jira/browse/DRILL-7927
>             Project: Apache Drill
>          Issue Type: Bug
>    Affects Versions: 1.18.0
>         Environment: *Docker:*
> Client: Docker Engine - Community
>  Cloud integration: 1.0.14
>  Version: 20.10.6
>  API version: 1.41
>  Go version: go1.13.15
>  Git commit: 370c289
>  Built: Fri Apr 9 22:46:45 2021
>  OS/Arch: linux/amd64
>  Context: default
>  Experimental: true
> Server: Docker Engine - Community
>  Engine:
>  Version: 20.10.6
>  API version: 1.41 (minimum version 1.12)
>  Go version: go1.13.15
>  Git commit: 8728dd2
>  Built: Fri Apr 9 22:44:56 2021
>  OS/Arch: linux/amd64
>  Experimental: false
>  containerd:
>  Version: 1.4.4
>  GitCommit: 05f951a3781f4f2c1911b05e61c160e9c30eaa8e
>  runc:
>  Version: 1.0.0-rc93
>  GitCommit: 12644e614e25b05da6fd08a38ffa0cfe1903fdec
>  docker-init:
>  Version: 0.19.0
>  GitCommit: de40ad0
> Running on Windows under WSL2.
>            Reporter: Anton Waniek
>            Priority: Minor
>         Attachments: image-2021-05-14-13-54-16-788.png, sqlline.log
>
>
> The "union type" data type is not supported by the Parquet format and thus, 
> Drill should handle the exception that occurs when the user attempts to write 
> this type to parquet. A NullPointerException is currently thrown.
> There are a few steps necessary to reproduce this bug but the process is 
> straightforward.
> To summarize the commands in advance: to have a table with columns using the 
> UNION type, one must first enable the union type option, then run a query 
> over a MongoDB collection with inhomogeneous types (e.g. strings and numbers) 
> (*n.b.* there may be a simpler way to get hold of a union type table but I am 
> not aware of it). One must then try to write the table to parquet. 
> First start MongoDB and store appropriate data inside:
> {code:bash}
> docker run --rm -it -d --name mongo-uniontype mongo:4.4
> # wait for mongo a bit
> sleep 1
> create_coll='db.uniontype_table.insertMany([{"column": 1},{"column": 
> "string"}])'
> docker exec -it mongo-uniontype mongo example --eval $create_coll
> # check the outcome
> docker exec -it mongo-uniontype mongo example --eval 
> 'db.uniontype_table.find()'{code}
> Run Drill and configure the Mongo storage plugin:
> {code:bash}
> docker run --rm -it -d --name drill-uniontype -p 8047:8047 \
>   apache/drill:latest /bin/bash
> mongo_ip=$(docker inspect -f 
> '{{range.NetworkSettings.Networks}}{{.IPAddress}}{{end}}' mongo-uniontype)
> mongo_conf() {
> cat <<EOF
> {
>   "name": "mongo",
>   "config": {"type":"mongo", "connection":"mongodb://$mongo_ip:27017/", 
> "enabled":"true"}
> }
> EOF
> }
> sleep 5  # wait a little for Drill
> curl -X POST -H "Content-Type: application/json" \
>   http://localhost:8047/storage/mongo.json --data "$(mongo_conf)"{code}
> Finally, attach to the freshly configured Drill, set the relevant option, and 
> run the query:
> {code:bash}
> docker attach drill-uniontype
> {code}
> then in the resulting *sqlline* command line:
> {code:java}
> use mongo.example;
> SET `exec.enable_union_type` = true;
> CREATE TABLE `dfs.tmp`.`not_gonna_work.parquet` AS (SELECT * FROM 
> `uniontype_table`);{code}
> And the last statement should raise the exception.
> !image-2021-05-14-13-54-16-788.png!
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to