[jira] [Updated] (IMPALA-9201) Impala can't read parquet file compressed by zstd bash command

Xiaomeng Zhang (Jira) Tue, 26 Nov 2019 11:41:02 -0800


     [ 
https://issues.apache.org/jira/browse/IMPALA-9201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Xiaomeng Zhang updated IMPALA-9201:
-----------------------------------
    Description: 
To reproduce:
 # get a parquet file written by impala
 # use "hadoop fs -get" to download locally
 # use command "zstd -i parquetfile -o zstdfile" to get a zstd compressed file 
parquet.zst.
 # use "hadoop fs -put" to put zstd file in directory "/test-warehouse/par_zstd"
 # in impala, create table with location on -"/test-warehouse/par_zstd"
 # run select * from that table, get error :
{code:java}
[localhost:21000] default> select * from par_zstd;
Query: select * from par_zstd
Query submitted at: 2019-11-25 14:59:07 (Coordinator: 
http://xiaomeng-OptiPlex-9020:25000)
Query progress can be monitored at: 
http://xiaomeng-OptiPlex-9020:25000/query_plan?query_id=b0411d5136965e30:549208ad00000000
ERROR: File 'hdfs://localhost:20500/test-warehouse/par_zstd/parquet.zst' has an 
invalid Parquet version number: ����
. Please check that it is a valid Parquet file. This error can also occur due 
to stale metadata. If you believe this is a valid Parquet file, try running 
"refresh default.par_zstd".
{code}
In hive run select * from table, get error:
{code:java}
Error: java.io.IOException: java.lang.RuntimeException: 
hdfs://localhost:20500/test-warehouse/par_zstd/parquet.zstd is not a Parquet 
file. expected magic number at tail [80, 65, 82, 49] but found [-2, -72, -113, 
-90] (state=,code=0)
{code}

  was:
To reproduce:
 # get a parquet file written by impala
 # use "hadoop fs -get" to download locally
 # use command "zstd -i parquetfile -o zstdfile" to get a zstd compressed file 
parquet.zst.
 # use "hadoop fs -put" to put zstd file in directory "/test-warehouse/par_zstd"
 # in impala, create table with location on -"/test-warehouse/par_zstd"
 # run select * from that table, get error :
{code:java}
[localhost:21000] default> select * from par_zstd;
Query: select * from par_zstd
Query submitted at: 2019-11-25 14:59:07 (Coordinator: 
http://xiaomeng-OptiPlex-9020:25000)
Query progress can be monitored at: 
http://xiaomeng-OptiPlex-9020:25000/query_plan?query_id=b0411d5136965e30:549208ad00000000
ERROR: File 'hdfs://localhost:20500/test-warehouse/par_zstd/parquet.zst' has an 
invalid Parquet version number: ����
. Please check that it is a valid Parquet file. This error can also occur due 
to stale metadata. If you believe this is a valid Parquet file, try running 
"refresh default.par_zstd".
{code}

 # In hive run select * from table, get error:
{code:java}
Error: java.io.IOException: java.lang.RuntimeException: 
hdfs://localhost:20500/test-warehouse/par_zstd/parquet.zstd is not a Parquet 
file. expected magic number at tail [80, 65, 82, 49] but found [-2, -72, -113, 
-90] (state=,code=0)
{code}


> Impala can't read parquet file compressed by zstd bash command
> --------------------------------------------------------------
>
>                 Key: IMPALA-9201
>                 URL: https://issues.apache.org/jira/browse/IMPALA-9201
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend
>    Affects Versions: Impala 3.4.0
>            Reporter: Xiaomeng Zhang
>            Assignee: Abhishek Rawat
>            Priority: Major
>
> To reproduce:
>  # get a parquet file written by impala
>  # use "hadoop fs -get" to download locally
>  # use command "zstd -i parquetfile -o zstdfile" to get a zstd compressed 
> file parquet.zst.
>  # use "hadoop fs -put" to put zstd file in directory 
> "/test-warehouse/par_zstd"
>  # in impala, create table with location on -"/test-warehouse/par_zstd"
>  # run select * from that table, get error :
> {code:java}
> [localhost:21000] default> select * from par_zstd;
> Query: select * from par_zstd
> Query submitted at: 2019-11-25 14:59:07 (Coordinator: 
> http://xiaomeng-OptiPlex-9020:25000)
> Query progress can be monitored at: 
> http://xiaomeng-OptiPlex-9020:25000/query_plan?query_id=b0411d5136965e30:549208ad00000000
> ERROR: File 'hdfs://localhost:20500/test-warehouse/par_zstd/parquet.zst' has 
> an invalid Parquet version number: ����
> . Please check that it is a valid Parquet file. This error can also occur due 
> to stale metadata. If you believe this is a valid Parquet file, try running 
> "refresh default.par_zstd".
> {code}
> In hive run select * from table, get error:
> {code:java}
> Error: java.io.IOException: java.lang.RuntimeException: 
> hdfs://localhost:20500/test-warehouse/par_zstd/parquet.zstd is not a Parquet 
> file. expected magic number at tail [80, 65, 82, 49] but found [-2, -72, 
> -113, -90] (state=,code=0)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (IMPALA-9201) Impala can't read parquet file compressed by zstd bash command

Reply via email to