[
https://issues.apache.org/jira/browse/IMPALA-7612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Zsombor Fedor updated IMPALA-7612:
----------------------------------
Description:
An empty Parquet file, with no rows in it causing a warning in explain:
{code:java}
WARNING: The following tables have potentially corrupt table statistics. Drop
and re-compute statistics to resolve this problem. {code}
This Warning is showing even after
{code:java}
compute stats tp;{code}
because:
{code:java}
partitions=1/1 files=1 size=220B{code}
but numRows = 0.
A simple reproduction:
{code:java}
create table tp (a int) stored as parquet;{code}
create and empty.csv file
create parquet file from the csv with a simple MR job:
[https://github.com/tomwhite/hadoop-book/blob/master/ch13-parquet/src/main/java/TextToParquetWithAvro.java]
using the following schema:
{code:java}
"{\n" +
" \"type\": \"record\",\n" +
" \"name\": \"tp\",\n" +
" \"doc\": \"Avro schema for table tp\",\n" +
" \"fields\":\n" +
" [\n" +
" {\"name\": \"a\", \"type\": \"int\"}\n"+
" ]\n"+
"}\n");{code}
Put the output Parquet file (PFA) onto the HDFS, then
{code:java}
compute stats tp;
explain select * from tp;
{code}
was:
An empty Parquet file, with no rows in it causing a warning in explain:
{code:java}
WARNING: The following tables have potentially corrupt table statistics. Drop
and re-compute statistics to resolve this problem. {code}
This Warning is showing even after
{code:java}
compute stats tp;{code}
because:
{code:java}
partitions=1/1 files=1 size=220B{code}
but numRows = 0.
A simple reproduction:
{code:java}
create table tp (a int) stored as parquet;{code}
Create and empty.csv file and create a parquet file from the csv with a simple
MR job:
[https://github.com/tomwhite/hadoop-book/blob/master/ch13-parquet/src/main/java/TextToParquetWithAvro.java]
using the following schema:
{code:java}
"{\n" +
" \"type\": \"record\",\n" +
" \"name\": \"tp\",\n" +
" \"doc\": \"Avro schema for table tp\",\n" +
" \"fields\":\n" +
" [\n" +
" {\"name\": \"a\", \"type\": \"int\"}\n"+
" ]\n"+
"}\n");{code}
Put the output Parquet file onto the HDFS, then
{code:java}
compute stats tp;
explain select * from tp;
{code}
> Parquet file with no rows in it causing WARNING in explain
> ----------------------------------------------------------
>
> Key: IMPALA-7612
> URL: https://issues.apache.org/jira/browse/IMPALA-7612
> Project: IMPALA
> Issue Type: Bug
> Components: Frontend
> Affects Versions: Impala 2.12.0
> Reporter: Zsombor Fedor
> Priority: Minor
> Attachments: part-m-00000.parquet
>
>
> An empty Parquet file, with no rows in it causing a warning in explain:
> {code:java}
> WARNING: The following tables have potentially corrupt table statistics. Drop
> and re-compute statistics to resolve this problem. {code}
> This Warning is showing even after
> {code:java}
> compute stats tp;{code}
> because:
> {code:java}
> partitions=1/1 files=1 size=220B{code}
> but numRows = 0.
> A simple reproduction:
> {code:java}
> create table tp (a int) stored as parquet;{code}
> create and empty.csv file
> create parquet file from the csv with a simple MR job:
> [https://github.com/tomwhite/hadoop-book/blob/master/ch13-parquet/src/main/java/TextToParquetWithAvro.java]
> using the following schema:
> {code:java}
> "{\n" +
> " \"type\": \"record\",\n" +
> " \"name\": \"tp\",\n" +
> " \"doc\": \"Avro schema for table tp\",\n" +
> " \"fields\":\n" +
> " [\n" +
> " {\"name\": \"a\", \"type\": \"int\"}\n"+
> " ]\n"+
> "}\n");{code}
> Put the output Parquet file (PFA) onto the HDFS, then
> {code:java}
> compute stats tp;
> explain select * from tp;
> {code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]