[ 
https://issues.apache.org/jira/browse/IMPALA-9415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17119266#comment-17119266
 ] 

ASF subversion and git services commented on IMPALA-9415:
---------------------------------------------------------

Commit 4cc1b4ad04cd5770a41961269d69a65cdfac1dcf in impala's branch 
refs/heads/master from Joe McDonnell
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=4cc1b4a ]

IMPALA-9415: Switch result set size calculations from capacity() to size()

The behavior of string's capacity() is implementation specific.
In GCC 7.5.0, the implementation has different behavior compared
to GCC 4.9.2. This is causing a DCHECK to fire in
ClientRequestState::FetchRowsInternal():

// Confirm that this was not an underestimate of the memory required.
DCHECK_GE(before + delta_bytes, after)

What happens on GCC 7.5.0 is that the capacity of the string before the
copy is 29, but after the copy to the result set, the capacity is 30.
The size remains unchanged.

This switches the code to use size(), which is guaranteed to be
consistent across copies. This loses some accuracy, because there is some
string object overhead and excess capacity that no longer counts. However,
this is not code that requires perfect accuracy.

Testing:
 - Ran core tests with GCC 4.9.2 and GCC 7.5.0

Change-Id: I3f9ab260927e14d8951b7c7661f2b5b18a1da39a
Reviewed-on: http://gerrit.cloudera.org:8080/15992
Reviewed-by: Impala Public Jenkins <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>


> DCHECK in ClientRequestState::FetchRowsInternal when using GCC7 with the new 
> ABI
> --------------------------------------------------------------------------------
>
>                 Key: IMPALA-9415
>                 URL: https://issues.apache.org/jira/browse/IMPALA-9415
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend
>    Affects Versions: Impala 3.4.0
>            Reporter: Joe McDonnell
>            Priority: Major
>
> ClientRequestState::FetchRowsInternal is hitting a DCHECK when running 
> hs2/test_fetch_first.py::TestFetchFirst::test_query_stmts_v1 and other hs2 
> tests:
> {noformat}
> F0221 14:27:15.796236  6013 client-request-state.cc:1090] Check failed: 
> before + delta_bytes >= after (14270 vs. 14340) Combined result sets consume 
> more memory than both individually (before: 0, delta_bytes: 14270, after: 
> 14340){noformat}
> This is firing because the size of a row increases when it is copied into the 
> result set. The size increases because the capacity of the string is 
> increasing when it is copied. In the row passed in, one field has a string 
> with size=29, capacity=29. After it is copied into the result set, it has 
> size=29, capacity=30. Since we count the string memory usage based on 
> capacity, the memory usage has gone up.
> In general, the behavior of capacity() is unspecified, and we can't rely on a 
> specific semantic.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to