Michael McCarthy created PARQUET-1235:
-----------------------------------------

             Summary: Parquet-tools cat mangles strings created by other clients
                 Key: PARQUET-1235
                 URL: https://issues.apache.org/jira/browse/PARQUET-1235
             Project: Parquet
          Issue Type: Bug
         Environment: {noformat}
uname -a
Linux myhost 4.4.0-63-generic #84-Ubuntu SMP Wed Feb 1 17:20:32 UTC 2017 x86_64 
x86_64 x86_64 GNU/Linux
{noformat}
            Reporter: Michael McCarthy


I have some parquet files that are created by Java MR process (which I do not 
own). I am able to read these fields successfully in pig and Spark, but for 
some reason the String fields are being mangled when I view the files with 
parquet-tools (cat).

Here are the details on the file metadata using today's build of parquet-tools:
{noformat}
hadoop jar parquet-tools-1.9.1-SNAPSHOT.jar meta <hdfs>/parquet-r-00000
{noformat}
Output:
{noformat}
file:          hdfs://<path>/parquet-r-00000
creator:       parquet-mr version 1.8.1 (build 
4aba4dae7bb0d4edbcf7923ae1339f28fd3f7fcf)

file schema:   MY_DATA
--------------------------------------------------------------------------------
myfield:       OPTIONAL BINARY R:0 D:1

row group 1:   RC:37343 TS:32397576 OFFSET:4
--------------------------------------------------------------------------------
myfield:       BINARY SNAPPY DO:0 FPO:4 SZ:273374/556406/2.04 VC:37343 
ENC:RLE,BIT_PACKED,PLAIN_DICTIONARY ST:[no stats for this column]
{noformat}
 Has anyone seen this before?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to