Hi,

I am trying to convert a JSON file in Parquet files using Drill. The query is:
ALTER SESSION SET `store.json.all_text_mode` = true;
Use dfs.tmpp;
ALTER SESSION SET `store.format` = 'parquet';
CREATE TABLE `testParquet` as select * from `test.json`;

The first line is done so that we don't have to worry about numbers, integers 
etc. For now reading them as strings works. When I run this query I get this 
error message (not clear):
Error: INTERNAL_ERROR ERROR: You tried to start when you are using a 
ValueWriter  of type SingleListWriter.

Attached is the JSON file and the trouble is with the first line. The line by 
itself can be folded into Parquet file(above CTAS works) and so are the rest of 
them by themselves. Both together gives this error. I ran a query to just read 
the file and I get the same error with this line and others but not alone (just 
like CTAS). I can get around reading the file by setting mixed mode as:
ALTER SESSION SET  `exec.enable_union_type` = true;
But then I get an error that List type isn't supported (I assume they are 
talking about mixed types in an array).
Here is the stack trace (from enabling verbose) in case of write failure:
Error: INTERNAL_ERROR ERROR: You tried to start when you are using a ValueWriter
of type SingleListWriter.

Fragment 0:0

[Error Id: 1ae5c2ce-e1ef-40f9-afce-d1e00ac9fa15 on IMC28859.imc2.com:31010]

  (java.lang.IllegalStateException) You tried to start when you are using a Valu
eWriter of type SingleListWriter.
    org.apache.drill.exec.vector.complex.impl.AbstractFieldWriter.start():78
    org.apache.drill.exec.vector.complex.impl.SingleListWriter.start():71
    org.apache.drill.exec.vector.complex.fn.JsonReader.writeDataAllText():430
    org.apache.drill.exec.vector.complex.fn.JsonReader.writeDataAllText():462
    org.apache.drill.exec.vector.complex.fn.JsonReader.writeDataAllText():462
    org.apache.drill.exec.vector.complex.fn.JsonReader.writeDataAllText():462
    org.apache.drill.exec.vector.complex.fn.JsonReader.writeDataAllText():462
    org.apache.drill.exec.vector.complex.fn.JsonReader.writeDataAllText():462
    org.apache.drill.exec.vector.complex.fn.JsonReader.writeDataSwitch():311
    org.apache.drill.exec.vector.complex.fn.JsonReader.writeToVector():254
    org.apache.drill.exec.vector.complex.fn.JsonReader.write():209
    org.apache.drill.exec.store.easy.json.JSONRecordReader.next():214
    org.apache.drill.exec.physical.impl.ScanBatch.next():177
    org.apache.drill.exec.record.AbstractRecordBatch.next():119
    org.apache.drill.exec.record.AbstractRecordBatch.next():109
    org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63
    org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():1
42
    org.apache.drill.exec.record.AbstractRecordBatch.next():172
    org.apache.drill.exec.record.AbstractRecordBatch.next():119
    org.apache.drill.exec.record.AbstractRecordBatch.next():109
    org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63
    org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():1
42
    org.apache.drill.exec.record.AbstractRecordBatch.next():172
    org.apache.drill.exec.record.AbstractRecordBatch.next():119
    org.apache.drill.exec.record.AbstractRecordBatch.next():109
    org.apache.drill.exec.physical.impl.WriterRecordBatch.innerNext():90
    org.apache.drill.exec.record.AbstractRecordBatch.next():172
    org.apache.drill.exec.record.AbstractRecordBatch.next():119
    org.apache.drill.exec.record.AbstractRecordBatch.next():109
    org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63
    org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():1
42
    org.apache.drill.exec.record.AbstractRecordBatch.next():172
    org.apache.drill.exec.physical.impl.BaseRootExec.next():103
    org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():83
    org.apache.drill.exec.physical.impl.BaseRootExec.next():93
    org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():294
    org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():281
    java.security.AccessController.doPrivileged():-2
    javax.security.auth.Subject.doAs():422
    org.apache.hadoop.security.UserGroupInformation.doAs():1657
    org.apache.drill.exec.work.fragment.FragmentExecutor.run():281
    org.apache.drill.common.SelfCleaningRunnable.run():38
    java.util.concurrent.ThreadPoolExecutor.runWorker():1142
    java.util.concurrent.ThreadPoolExecutor$Worker.run():617
    java.lang.Thread.run():745 (state=,code=0)
0: jdbc:drill:zk=local>


Here is the trace with mixed mode enabled:
0: jdbc:drill:zk=local> ALTER SESSION SET  `exec.enable_union_type` = true;
+-------+----------------------------------+
|  ok   |             summary              |
+-------+----------------------------------+
| true  | exec.enable_union_type updated.  |
+-------+----------------------------------+
1 row selected (0.173 seconds)
0: jdbc:drill:zk=local> CREATE TABLE `test.parquet` as select * from `test.json`
;
Error: SYSTEM ERROR: UnsupportedOperationException: Unsupported type LIST

Fragment 0:0

[Error Id: 04db6e6a-aa66-4c4f-9573-23fc1215c638 on IMC28859.imc2.com:31010]

  (java.lang.UnsupportedOperationException) Unsupported type LIST
    org.apache.drill.exec.store.parquet.ParquetRecordWriter.getType():295
    org.apache.drill.exec.store.parquet.ParquetRecordWriter.getType():291
    org.apache.drill.exec.store.parquet.ParquetRecordWriter.getType():291
    org.apache.drill.exec.store.parquet.ParquetRecordWriter.getType():291
    org.apache.drill.exec.store.parquet.ParquetRecordWriter.newSchema():226
    org.apache.drill.exec.store.parquet.ParquetRecordWriter.updateSchema():211
    org.apache.drill.exec.physical.impl.WriterRecordBatch.setupNewSchema():162
    org.apache.drill.exec.physical.impl.WriterRecordBatch.innerNext():108
    org.apache.drill.exec.record.AbstractRecordBatch.next():172
    org.apache.drill.exec.record.AbstractRecordBatch.next():119
    org.apache.drill.exec.record.AbstractRecordBatch.next():109
    org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63
    org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():1
42
    org.apache.drill.exec.record.AbstractRecordBatch.next():172
    org.apache.drill.exec.physical.impl.BaseRootExec.next():103
    org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():83
    org.apache.drill.exec.physical.impl.BaseRootExec.next():93
    org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():294
    org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():281
    java.security.AccessController.doPrivileged():-2
    javax.security.auth.Subject.doAs():422
    org.apache.hadoop.security.UserGroupInformation.doAs():1657
    org.apache.drill.exec.work.fragment.FragmentExecutor.run():281
    org.apache.drill.common.SelfCleaningRunnable.run():38
    java.util.concurrent.ThreadPoolExecutor.runWorker():1142
    java.util.concurrent.ThreadPoolExecutor$Worker.run():617
    java.lang.Thread.run():745 (state=,code=0)
0: jdbc:drill:zk=local>


All lines pass formatting and I could infer a schema from these 
(jsonSchema.net). There are some irrelevant things in the JSON which is 
generated from XML like namespace declarations but that should not interfere 
from what I can say.

Any help would be appreciated.

Sri

Reply via email to