[
https://issues.apache.org/jira/browse/IMPALA-10186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17723399#comment-17723399
]
Zoltán Borók-Nagy commented on IMPALA-10186:
--------------------------------------------
Thanks for investigating this, [~MikaelSmith].
Based on your description the following statements trigger the bug:
{noformat}
set PARQUET_PAGE_ROW_COUNT_LIMIT=20000;
create table empty_parquet_page
stored as parquet
as select distinct l_orderkey as n from tpch_parquet.lineitem order by
l_orderkey limit 41000;{noformat}
Now after writing out 40000 values we both hit the dictionary limit and the
PARQUET_PAGE_ROW_COUNT_LIMIT.
> Write invalid parquet PageLocations which table sort by some columns
> --------------------------------------------------------------------
>
> Key: IMPALA-10186
> URL: https://issues.apache.org/jira/browse/IMPALA-10186
> Project: IMPALA
> Issue Type: Bug
> Components: Backend
> Reporter: guojingfeng
> Assignee: Michael Smith
> Priority: Major
> Labels: parquet
>
> Current parquet writer write -1 of PageLocation.offset and
> PageLocation.first_row_index when meet a empty page.
> hdfs-parquet-file-writer.cc Line: 808 ~ 819
> {code:java}
> // Write data pages
> for (const DataPage& page : pages_) {
> if (page.header.data_page_header.num_values == 0) {
> // Skip empty pages
> location.offset = -1;
> location.compressed_page_size = 0;
> location.first_row_index = -1;
> AddLocationToOffsetIndex(location);
> continue;
> }
> {code}
> But -1 values may cause ComputeCandidatePages function run into unexpected
> status.
> {code:java}
> bool ComputeCandidatePages(
> const vector<parquet::PageLocation>& page_locations,
> const vector<RowRange>& candidate_ranges,
> const int64_t num_rows, vector<int>* candidate_pages) {
> if (!ValidatePageLocations(page_locations, num_rows)) return false
> {code}
> and then cause IMPALA-9952
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]