Re: Issue reading JSON file prohibiting from creating a Parquet file from it.

Paul Rogers Thu, 30 Aug 2018 22:06:59 -0700

Hi Sri,

The fact that each line can be converted, but multiple throw an error suggests 
that you may have conflicting types. Drill tries to handle such cases, but 
there are many holes, sounds like you are hitting one of them.



The error message mentions "SingleListWriter". The single list occurs when you 
have a list of mixed types such as:

[10, null, "foo"]

It may also occur in other cases such as:
{a: [10]}{a: null}

The union type is experimental and is, frankly, broken. It works in the very 
restricted case of a query with no filters or other operations. I would doubt 
that the Parquet writer handles unions.


Your JSON did not come across (attachments are stripped). Perhaps paste the 
first couple of records into an e-mail.

Thanks,
- Paul

 

    On Thursday, August 30, 2018, 8:00:55 PM PDT, Sri Krishna 
<[email protected]> wrote:  
 
  
Hi,
 
  
 
I am trying to convert a JSON file in Parquet files using Drill. The query is:
 
ALTER SESSION SET `store.json.all_text_mode` = true; 
 
Use dfs.tmpp;
 
ALTER SESSION SET `store.format` = 'parquet';
 
CREATE TABLE `testParquet` as select * from `test.json`;
 
  
 
The first line is done so that we don’t have to worry about numbers, integers 
etc. For now reading them as strings works. When I run this query I get this 
error message (not clear):
 
Error: INTERNAL_ERROR ERROR: You tried to start when you are using a 
ValueWriter  of type SingleListWriter.
 
  
 
Attached is the JSON file and the trouble is with the first line. The line by 
itself can be folded into Parquet file(above CTAS works) and so are the rest of 
them by themselves. Both together gives this error. I ran a query to just read 
the file and I get the same error with this line and others but not alone (just 
like CTAS). I can get around reading the file by setting mixed mode as:
 
ALTER SESSION SET  `exec.enable_union_type` = true;
 

 
But then I get an error that List type isn’t supported (I assume they are 
talking about mixed types in an array).
 
Here is the stack trace (from enabling verbose) in case of write failure:
 
Error: INTERNAL_ERROR ERROR: You tried to start when you are using a ValueWriter
 
of type SingleListWriter.
 
  
 
Fragment 0:0
 
  
 
[Error Id: 1ae5c2ce-e1ef-40f9-afce-d1e00ac9fa15 on IMC28859.imc2.com:31010]
 
  
 
  (java.lang.IllegalStateException) You tried to start when you are using a Valu
 
eWriter of type SingleListWriter.
 
    org.apache.drill.exec.vector.complex.impl.AbstractFieldWriter.start():78
 
    org.apache.drill.exec.vector.complex.impl.SingleListWriter.start():71
 
    org.apache.drill.exec.vector.complex.fn.JsonReader.writeDataAllText():430
 
    org.apache.drill.exec.vector.complex.fn.JsonReader.writeDataAllText():462
 
    org.apache.drill.exec.vector.complex.fn.JsonReader.writeDataAllText():462
 
    org.apache.drill.exec.vector.complex.fn.JsonReader.writeDataAllText():462
 
    org.apache.drill.exec.vector.complex.fn.JsonReader.writeDataAllText():462
 
    org.apache.drill.exec.vector.complex.fn.JsonReader.writeDataAllText():462
 
    org.apache.drill.exec.vector.complex.fn.JsonReader.writeDataSwitch():311
 
    org.apache.drill.exec.vector.complex.fn.JsonReader.writeToVector():254
 
    org.apache.drill.exec.vector.complex.fn.JsonReader.write():209
 
    org.apache.drill.exec.store.easy.json.JSONRecordReader.next():214
 
    org.apache.drill.exec.physical.impl.ScanBatch.next():177
 
    org.apache.drill.exec.record.AbstractRecordBatch.next():119
 
    org.apache.drill.exec.record.AbstractRecordBatch.next():109
 
    org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63
 
    org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():1
 
42
 
    org.apache.drill.exec.record.AbstractRecordBatch.next():172
 
    org.apache.drill.exec.record.AbstractRecordBatch.next():119
 
    org.apache.drill.exec.record.AbstractRecordBatch.next():109
 
    org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63
 
    org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():1
 
42
 
    org.apache.drill.exec.record.AbstractRecordBatch.next():172
 
    org.apache.drill.exec.record.AbstractRecordBatch.next():119
 
    org.apache.drill.exec.record.AbstractRecordBatch.next():109
 
    org.apache.drill.exec.physical.impl.WriterRecordBatch.innerNext():90
 
    org.apache.drill.exec.record.AbstractRecordBatch.next():172
 
    org.apache.drill.exec.record.AbstractRecordBatch.next():119
 
    org.apache.drill.exec.record.AbstractRecordBatch.next():109
 
    org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63
 
    org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():1
 
42
 
    org.apache.drill.exec.record.AbstractRecordBatch.next():172
 
    org.apache.drill.exec.physical.impl.BaseRootExec.next():103
 
    org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():83
 
    org.apache.drill.exec.physical.impl.BaseRootExec.next():93
 
    org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():294
 
    org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():281
 
    java.security.AccessController.doPrivileged():-2
 
    javax.security.auth.Subject.doAs():422
 
    org.apache.hadoop.security.UserGroupInformation.doAs():1657
 
    org.apache.drill.exec.work.fragment.FragmentExecutor.run():281
 
    org.apache.drill.common.SelfCleaningRunnable.run():38
 
    java.util.concurrent.ThreadPoolExecutor.runWorker():1142
 
    java.util.concurrent.ThreadPoolExecutor$Worker.run():617
 
    java.lang.Thread.run():745 (state=,code=0)
 
0: jdbc:drill:zk=local>
 
  
 
  
 
Here is the trace with mixed mode enabled:
 
0: jdbc:drill:zk=local> ALTER SESSION SET  `exec.enable_union_type` = true;
 
+-------+----------------------------------+
 
|  ok   |             summary              |
 
+-------+----------------------------------+
 
| true  | exec.enable_union_type updated.  |
 
+-------+----------------------------------+
 
1 row selected (0.173 seconds)
 
0: jdbc:drill:zk=local> CREATE TABLE `test.parquet` as select * from `test.json`
 
;
 
Error: SYSTEM ERROR: UnsupportedOperationException: Unsupported type LIST
 
  
 
Fragment 0:0
 
  
 
[Error Id: 04db6e6a-aa66-4c4f-9573-23fc1215c638 on IMC28859.imc2.com:31010]
 
  
 
  (java.lang.UnsupportedOperationException) Unsupported type LIST
 
    org.apache.drill.exec.store.parquet.ParquetRecordWriter.getType():295
 
    org.apache.drill.exec.store.parquet.ParquetRecordWriter.getType():291
 
    org.apache.drill.exec.store.parquet.ParquetRecordWriter.getType():291
 
    org.apache.drill.exec.store.parquet.ParquetRecordWriter.getType():291
 
    org.apache.drill.exec.store.parquet.ParquetRecordWriter.newSchema():226
 
    org.apache.drill.exec.store.parquet.ParquetRecordWriter.updateSchema():211
 
    org.apache.drill.exec.physical.impl.WriterRecordBatch.setupNewSchema():162
 
    org.apache.drill.exec.physical.impl.WriterRecordBatch.innerNext():108
 
    org.apache.drill.exec.record.AbstractRecordBatch.next():172
 
    org.apache.drill.exec.record.AbstractRecordBatch.next():119
 
    org.apache.drill.exec.record.AbstractRecordBatch.next():109
 
    org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63
 
    org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():1
 
42
 
    org.apache.drill.exec.record.AbstractRecordBatch.next():172
 
    org.apache.drill.exec.physical.impl.BaseRootExec.next():103
 
    org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():83
 
    org.apache.drill.exec.physical.impl.BaseRootExec.next():93
 
    org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():294
 
    org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():281
 
    java.security.AccessController.doPrivileged():-2
 
    javax.security.auth.Subject.doAs():422
 
    org.apache.hadoop.security.UserGroupInformation.doAs():1657
 
    org.apache.drill.exec.work.fragment.FragmentExecutor.run():281
 
    org.apache.drill.common.SelfCleaningRunnable.run():38
 
    java.util.concurrent.ThreadPoolExecutor.runWorker():1142
 
    java.util.concurrent.ThreadPoolExecutor$Worker.run():617
 
    java.lang.Thread.run():745 (state=,code=0)
 
0: jdbc:drill:zk=local>
 
  
 
  
 
All lines pass formatting and I could infer a schema from these 
(jsonSchema.net). There are some irrelevant things in the JSON which is 
generated from XML like namespace declarations but that should not interfere 
from what I can say.
 
  
 
Any help would be appreciated.
 
  
 
Sri

Re: Issue reading JSON file prohibiting from creating a Parquet file from it.

Reply via email to