devinjdangelo opened a new issue, #7816:
URL: https://github.com/apache/arrow-datafusion/issues/7816
### Describe the bug
I am seeing the following error when attempting to query a table of hive
style partitioned JSON files via `datafusion-cli`:
`Arrow error: Json error: Encountered unmasked nulls in non-nullable
StructArray child: Field { name: "a", data_type: Utf8, nullable: false,
dict_id: 0, dict_is_ordered: false, metadata: {} }`
### To Reproduce
I have a table with the following directory structure:
```bash
dev@dev:~/arrow-datafusion/test_table$ ls -lhR
.:
total 12K
drwxrwxr-x 2 dev dev 4.0K Oct 12 17:35 'a=2'
drwxrwxr-x 2 dev dev 4.0K Oct 12 17:35 'a=4'
drwxrwxr-x 2 dev dev 4.0K Oct 12 17:35 'a=6'
'./a=2':
total 4.0K
-rw-rw-r-- 1 dev dev 20 Oct 12 17:35 tn0Sfag4abaDm6i2.json
'./a=4':
total 4.0K
-rw-rw-r-- 1 dev dev 20 Oct 12 17:35 tn0Sfag4abaDm6i2.json
'./a=6':
total 4.0K
-rw-rw-r-- 1 dev dev 20 Oct 12 17:35 tn0Sfag4abaDm6i2.json
```
And the JSON files look like:
```bash
dev@dev:~/arrow-datafusion/test_table$ cat a\=2/tn0Sfag4abaDm6i2.json
{"b":"1"}
{"b":"1"}
```
Attempting to query like the following fails:
```bash
dev@dev:~/arrow-datafusion$ datafusion-cli
DataFusion CLI v32.0.0
❯ CREATE EXTERNAL TABLE
json_test(a string, b string)
STORED AS json
LOCATION './test_table'
PARTITIONED BY (a);
0 rows in set. Query took 0.001 seconds.
❯ select * from json_test;
Arrow error: Json error: Encountered unmasked nulls in non-nullable
StructArray child: Field { name: "a", data_type: Utf8, nullable: false,
dict_id: 0, dict_is_ordered: false, metadata: {} }
❯
```
Querying without the partitions defined works as expected:
```bash
dev@dev:~/arrow-datafusion$ datafusion-cli
DataFusion CLI v32.0.0
❯ CREATE EXTERNAL TABLE
json_test(b string)
STORED AS json
LOCATION './test_table';
0 rows in set. Query took 0.001 seconds.
❯ select * from json_test;
+---+
| b |
+---+
| 3 |
| 3 |
| 1 |
| 1 |
| 5 |
| 5 |
+---+
6 rows in set. Query took 0.002 seconds.
```
The exact same table structure DDL works for CSV and parquet files, but not
JSON.
### Expected behavior
The above json query should work.
### Additional context
I discovered this while working on
https://github.com/apache/arrow-datafusion/pull/7801/files#diff-0580d65ff5db0c78c1fa4cf693f2567e7d2394923412687560614836401c223f,
and there are additional relevant tests in this PR.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]