Hi Sri,
The fact that each line can be converted, but multiple throw an error suggests
that you may have conflicting types. Drill tries to handle such cases, but
there are many holes, sounds like you are hitting one of them.
The error message mentions "SingleListWriter". The single list occurs when you
have a list of mixed types such as:
[10, null, "foo"]
It may also occur in other cases such as:
{a: [10]}{a: null}
The union type is experimental and is, frankly, broken. It works in the very
restricted case of a query with no filters or other operations. I would doubt
that the Parquet writer handles unions.
Your JSON did not come across (attachments are stripped). Perhaps paste the
first couple of records into an e-mail.
Thanks,
- Paul
On Thursday, August 30, 2018, 8:00:55 PM PDT, Sri Krishna
<[email protected]> wrote:
Hi,
I am trying to convert a JSON file in Parquet files using Drill. The query is:
ALTER SESSION SET `store.json.all_text_mode` = true;
Use dfs.tmpp;
ALTER SESSION SET `store.format` = 'parquet';
CREATE TABLE `testParquet` as select * from `test.json`;
The first line is done so that we don’t have to worry about numbers, integers
etc. For now reading them as strings works. When I run this query I get this
error message (not clear):
Error: INTERNAL_ERROR ERROR: You tried to start when you are using a
ValueWriter of type SingleListWriter.
Attached is the JSON file and the trouble is with the first line. The line by
itself can be folded into Parquet file(above CTAS works) and so are the rest of
them by themselves. Both together gives this error. I ran a query to just read
the file and I get the same error with this line and others but not alone (just
like CTAS). I can get around reading the file by setting mixed mode as:
ALTER SESSION SET `exec.enable_union_type` = true;
But then I get an error that List type isn’t supported (I assume they are
talking about mixed types in an array).
Here is the stack trace (from enabling verbose) in case of write failure:
Error: INTERNAL_ERROR ERROR: You tried to start when you are using a ValueWriter
of type SingleListWriter.
Fragment 0:0
[Error Id: 1ae5c2ce-e1ef-40f9-afce-d1e00ac9fa15 on IMC28859.imc2.com:31010]
(java.lang.IllegalStateException) You tried to start when you are using a Valu
eWriter of type SingleListWriter.
org.apache.drill.exec.vector.complex.impl.AbstractFieldWriter.start():78
org.apache.drill.exec.vector.complex.impl.SingleListWriter.start():71
org.apache.drill.exec.vector.complex.fn.JsonReader.writeDataAllText():430
org.apache.drill.exec.vector.complex.fn.JsonReader.writeDataAllText():462
org.apache.drill.exec.vector.complex.fn.JsonReader.writeDataAllText():462
org.apache.drill.exec.vector.complex.fn.JsonReader.writeDataAllText():462
org.apache.drill.exec.vector.complex.fn.JsonReader.writeDataAllText():462
org.apache.drill.exec.vector.complex.fn.JsonReader.writeDataAllText():462
org.apache.drill.exec.vector.complex.fn.JsonReader.writeDataSwitch():311
org.apache.drill.exec.vector.complex.fn.JsonReader.writeToVector():254
org.apache.drill.exec.vector.complex.fn.JsonReader.write():209
org.apache.drill.exec.store.easy.json.JSONRecordReader.next():214
org.apache.drill.exec.physical.impl.ScanBatch.next():177
org.apache.drill.exec.record.AbstractRecordBatch.next():119
org.apache.drill.exec.record.AbstractRecordBatch.next():109
org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63
org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():1
42
org.apache.drill.exec.record.AbstractRecordBatch.next():172
org.apache.drill.exec.record.AbstractRecordBatch.next():119
org.apache.drill.exec.record.AbstractRecordBatch.next():109
org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63
org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():1
42
org.apache.drill.exec.record.AbstractRecordBatch.next():172
org.apache.drill.exec.record.AbstractRecordBatch.next():119
org.apache.drill.exec.record.AbstractRecordBatch.next():109
org.apache.drill.exec.physical.impl.WriterRecordBatch.innerNext():90
org.apache.drill.exec.record.AbstractRecordBatch.next():172
org.apache.drill.exec.record.AbstractRecordBatch.next():119
org.apache.drill.exec.record.AbstractRecordBatch.next():109
org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63
org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():1
42
org.apache.drill.exec.record.AbstractRecordBatch.next():172
org.apache.drill.exec.physical.impl.BaseRootExec.next():103
org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():83
org.apache.drill.exec.physical.impl.BaseRootExec.next():93
org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():294
org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():281
java.security.AccessController.doPrivileged():-2
javax.security.auth.Subject.doAs():422
org.apache.hadoop.security.UserGroupInformation.doAs():1657
org.apache.drill.exec.work.fragment.FragmentExecutor.run():281
org.apache.drill.common.SelfCleaningRunnable.run():38
java.util.concurrent.ThreadPoolExecutor.runWorker():1142
java.util.concurrent.ThreadPoolExecutor$Worker.run():617
java.lang.Thread.run():745 (state=,code=0)
0: jdbc:drill:zk=local>
Here is the trace with mixed mode enabled:
0: jdbc:drill:zk=local> ALTER SESSION SET `exec.enable_union_type` = true;
+-------+----------------------------------+
| ok | summary |
+-------+----------------------------------+
| true | exec.enable_union_type updated. |
+-------+----------------------------------+
1 row selected (0.173 seconds)
0: jdbc:drill:zk=local> CREATE TABLE `test.parquet` as select * from `test.json`
;
Error: SYSTEM ERROR: UnsupportedOperationException: Unsupported type LIST
Fragment 0:0
[Error Id: 04db6e6a-aa66-4c4f-9573-23fc1215c638 on IMC28859.imc2.com:31010]
(java.lang.UnsupportedOperationException) Unsupported type LIST
org.apache.drill.exec.store.parquet.ParquetRecordWriter.getType():295
org.apache.drill.exec.store.parquet.ParquetRecordWriter.getType():291
org.apache.drill.exec.store.parquet.ParquetRecordWriter.getType():291
org.apache.drill.exec.store.parquet.ParquetRecordWriter.getType():291
org.apache.drill.exec.store.parquet.ParquetRecordWriter.newSchema():226
org.apache.drill.exec.store.parquet.ParquetRecordWriter.updateSchema():211
org.apache.drill.exec.physical.impl.WriterRecordBatch.setupNewSchema():162
org.apache.drill.exec.physical.impl.WriterRecordBatch.innerNext():108
org.apache.drill.exec.record.AbstractRecordBatch.next():172
org.apache.drill.exec.record.AbstractRecordBatch.next():119
org.apache.drill.exec.record.AbstractRecordBatch.next():109
org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63
org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():1
42
org.apache.drill.exec.record.AbstractRecordBatch.next():172
org.apache.drill.exec.physical.impl.BaseRootExec.next():103
org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():83
org.apache.drill.exec.physical.impl.BaseRootExec.next():93
org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():294
org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():281
java.security.AccessController.doPrivileged():-2
javax.security.auth.Subject.doAs():422
org.apache.hadoop.security.UserGroupInformation.doAs():1657
org.apache.drill.exec.work.fragment.FragmentExecutor.run():281
org.apache.drill.common.SelfCleaningRunnable.run():38
java.util.concurrent.ThreadPoolExecutor.runWorker():1142
java.util.concurrent.ThreadPoolExecutor$Worker.run():617
java.lang.Thread.run():745 (state=,code=0)
0: jdbc:drill:zk=local>
All lines pass formatting and I could infer a schema from these
(jsonSchema.net). There are some irrelevant things in the JSON which is
generated from XML like namespace declarations but that should not interfere
from what I can say.
Any help would be appreciated.
Sri