Joe McDonnell created IMPALA-14181:
--------------------------------------

             Summary: DCHECK in Sorter::Run::ConvertValueOffsetsToPtrs() with 
low memory
                 Key: IMPALA-14181
                 URL: https://issues.apache.org/jira/browse/IMPALA-14181
             Project: IMPALA
          Issue Type: Bug
          Components: Backend
    Affects Versions: Impala 5.0.0
            Reporter: Joe McDonnell


When testing with tuple caching, queries start to use more memory. I hit this 
assert while running test_sort.py's TestArraySort:
{noformat}
F20250623 14:59:49.492782 1002052 sorter.cc:829] 
d84fbaa341de8fbb:027dfece00000000] Check failed: page_offset == 0 (2311 vs. 0) 
{noformat}
>From here:
{noformat}
    if (page_index > var_len_pages_index_) {
      // We've reached the page boundary for the current var-len page.
      // This tuple will be returned in the next call to GetNext().
      DCHECK_GE(page_index, 0);
      DCHECK_LE(page_index, var_len_pages_.size());
      DCHECK_EQ(page_index, var_len_pages_index_ + 1);
      DCHECK_EQ(page_offset, 0); <-------------- HERE
                                 // The data is the first thing in the next 
page.
                                 // This must be the first slot with var len 
data for the
                                 // tuple. Var len data for tuple shouldn't be 
split
                                 // across blocks.
      DCHECK(AllPrevSlotsAreNullsOrSmall<ValueType>(tuple, slots, idx));
      return false;
    }{noformat}
This is easy to reproduce by using a slightly tighter memory value. On my 
machine, this works:
{noformat}
set max_sort_run_size=2;
set num_nodes=1;
-- ordinarily buffer_pool_limit=44m, but use a slightly tighter value
set buffer_pool_limit=41m;
select string_col, int_array, double_map, string_array, mixed from 
functional_parquet.arrays_big order by string_col;{noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to