Hi All,

As discussed on last Parquet sync, PARQUET-215 and PARQUET-287 introduced a
backward incompatible change in the convert method, which was resolved
in PARQUET-346 in a way that a new method was added instead of restoring
the old method logic, and adding a new method with the new logic (actually
PARQUET-405 was closed as duplicate of this, but in fact the convert still
throws error, users should use the new method to avoid the exception, which
is a code change on the client side).

We promised an example for this issue, here is an sample Parquet code,
which breaks with the newer version, but worked fine before PARQUET-215:

import scala.collection.JavaConverters._

import parquet.thrift.ThriftSchemaConverter
import parquet.thrift.struct.ThriftType.{StructType, StringType}
import parquet.thrift.struct.ThriftField
import parquet.thrift.struct.ThriftField.Requirement._

object Main extends App {

  val converter = new ThriftSchemaConverter()

  val structType = new StructType(
    List(
      new ThriftField("a", 1, REQUIRED, new StringType()),
      new ThriftField("b", 2, REQUIRED, new StringType())
    ).asJava
  )

  val converted = converter.convert(structType)
}

Here is the exception:

org.apache.parquet.ShouldNeverHappenException: Encountered UNKNOWN
StructOrUnionType

at
org.apache.parquet.thrift.ThriftSchemaConvertVisitor.isUnion(ThriftSchemaConvertVisitor.java:344)
at
org.apache.parquet.thrift.ThriftSchemaConvertVisitor.visit(ThriftSchemaConvertVisitor.java:220)
at
org.apache.parquet.thrift.ThriftSchemaConvertVisitor.visit(ThriftSchemaConvertVisitor.java:76)
at
org.apache.parquet.thrift.struct.ThriftType$StructType.accept(ThriftType.java:272)
at
org.apache.parquet.thrift.ThriftSchemaConvertVisitor.convert(ThriftSchemaConvertVisitor.java:95)
at
org.apache.parquet.thrift.ThriftSchemaConverter.convert(ThriftSchemaConverter.java:72)
at
org.apache.parquet.thrift.TestThriftMetaData.test(TestThriftMetaData.java:70)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)

I'd propose the following solution to resolve the this issue:
Drop StructOrUnionType UNKNOW category, and in StructType constructor
<https://github.com/apache/parquet-mr/blob/master/parquet-thrift/src/main/java/org/apache/parquet/thrift/struct/ThriftType.java#L233>
use
STRUCT as default if the second parameter is null, something like this:

this.structOrUnionType = structOrUnionType == null ?
*StructOrUnionType.STRUCT* : structOrUnionType;

I'm not sure what is the benefit of having an UNKNOW field and throwing an
exception if it is unknown, in my opinion we can instead use STRUCT as a
default, since as far as I understood, before adding UNION support, a
structType simply represented a STRUCT (someone who is more familiar with
this part of the code should confirm this). What do you think?

Thanks,
Nandor

Reply via email to