Filed a JIRA ticket for this issue https://issues.apache.org/jira/browse/PARQUET-357

Cheng

On 7/15/15 1:10 AM, Ryan Blue wrote:
This sounds like something we should fix. It may work for Thrift and Scrooge because they have an external schema, but you're right that this will cause buggy behavior for other object models.

Alex and Tianshuo, any ideas about how to address this? It looks like we need to update the ThriftSchemaConverter when converting the Thrift object to Parquet's representation of its schema. That should detect that the field is a binary (through reflection?) even though the underlying Thrift metadata doesn't encode it.

rb

On 07/07/2015 04:00 PM, Cheng Lian wrote:
You may see that parquet-mr 1.7.0 can only handle Thrift STRING, and
always add UTF8 annotation:
https://github.com/apache/parquet-mr/blob/apache-parquet-1.7.0/parquet-thrift/src/main/java/org/apache/parquet/thrift/ThriftSchemaConvertVisitor.java#L249-L252


Because there’s just no |ThriftType.BinaryType|.

On 7/7/15 3:56 PM, Cheng Lian wrote:

On 7/7/15 3:48 PM, Ryan Blue wrote:

On 07/07/2015 03:23 PM, Cheng Lian wrote:
On 7/7/15 1:28 PM, Ashish Singh wrote:
I think you mean that we can’t treat Thrift BINARY type as UTF-8
string,
right?
Yeah, it's possible that a Thrift BINARY contains illegal UTF-8 byte
sequence(s) and I suppose this may cause problem. Trying to verify
this.

Isn't this the right behavior? As long as it isn't annotated as a
UTF8, then storing it as binary should be fine.

Ah, it’s actually annotated as UTF8…

Internally Thrift just maps BINARY to STRING and doesn’t have any
annotation indicating that this field is a BINARY, so Parquet just
assume it’s a normal UTF8 string and writes “BINARY (UTF8)”.

Here are my testing Thrift schema and the Parquet schema extracted
from the written Parquet file by |parquet-schema|:

|struct ParquetThriftCompat { 1: binary binaryColumn; 2: string
stringColumn; } message ParquetSchema { optional binary binaryColumn
(UTF8); optional binary stringColumn (UTF8); } |

rb





Reply via email to