Hi Yu Feng,
It looks like we already changed Impala to accept valid files with no row
groups: https://issues.apache.org/jira/browse/IMPALA-3943
That error should only be hit if the file metadata reports that it has rows:
// IMPALA-3943: Do not throw an error for empty files for backwards
compatibility.
if (file_metadata_.num_rows == 0) return Status::OK();
// Parse out the created by application version string
if (file_metadata_.__isset.created_by) {
file_version_ = ParquetFileVersion(file_metadata_.created_by);
}
if (file_metadata_.row_groups.empty()) {
return Status(
Substitute("Invalid file. This file: $0 has no row groups",
filename()));
}
On Sun, Jul 16, 2017 at 11:36 PM, yu feng <[email protected]> wrote:
> Hi all,
>
> I always have a query error when I query a parquet table and the table
> have a empty parquet file, which means the files only have footer
> information and do not have any row group.
>
> I check the code and find the code:
>
> if (file_metadata_.row_groups.empty()) {
> return Status(
> Substitute("Invalid file. This file: $0 has no row groups",
> filename()));
> }
>
> I want to modify the logic, If find a no-row-group file, I want to skip the
> scan range and do not return any row-batch from the parquet-scanner, Is it
> right to doing like this, and do you have some another suggestion about
> the situation?
>
> Thanks a lots
>