[
https://issues.apache.org/jira/browse/IMPALA-3208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Armstrong resolved IMPALA-3208.
-----------------------------------
Resolution: Fixed
Fix Version/s: Impala 2.10.0
IMPALA-3208: max_row_size option
Adds support for a "max_row_size" query option that instructs Impala
to reserve enough memory to process rows of the specified size. For
spilling operators, the planner reserves enough memory to process
rows of this size. The advantage of this compared to simply
specifying larger values for min_spillable_buffer_size and
default_spillable_buffer_size is that operators may be able to
handler larger rows without increasing the size of all their
buffers.
The default value is 512KB. I picked that number because it doesn't
increase minimum reservations *too* much even with smaller buffers
like 64kb but should be large enough for almost all reasonable
workloads.
This is implemented in the aggs and joins using the variable page size
support added to BufferedTupleStream in an earlier commit. The synopsis
is that each stream requires reservation for one default-sized page
per read and write iterator, and temporarily requires reservation
for a max-sized page when reading or writing larger pages. The
max-sized write reservation is released immediately after the row
is appended and the max-size read reservation is released after
advancing to the next row.
The sorter and analytic simply use max-sized buffers for all pages
in the stream.
Testing:
Updated existing planner tests to reflect default max_row_size. Added
new planner tests to test the effect of the query option.
Added "set" test to check validation of query option.
Added end-to-end tests exercising spilling operators with large rows
with and without spilling induced by SET_DENY_RESERVATION_PROBABILITY.
Change-Id: Ic70f6dddbcef124bb4b329ffa2e42a74a1826570
Reviewed-on: http://gerrit.cloudera.org:8080/7629
Reviewed-by: Tim Armstrong <[email protected]>
Tested-by: Impala Public Jenkins
> Backend support for large rows
> ------------------------------
>
> Key: IMPALA-3208
> URL: https://issues.apache.org/jira/browse/IMPALA-3208
> Project: IMPALA
> Issue Type: Sub-task
> Components: Backend
> Affects Versions: Impala 2.6.0
> Reporter: Tim Armstrong
> Assignee: Tim Armstrong
> Priority: Critical
> Labels: resource-management
> Fix For: Impala 2.10.0
>
>
> We need to ensure that all exec nodes can support rows larger than the
> default page size. The default page size will be a query option, so users can
> always increase that, however minimum memory requirements will scale
> proportionally, which makes this less appealing.
> We should also add a max_row_size query option that controls the maximum size
> of rows supported by operators (at least those that use the reservation
> mechanism). We should be able to support large rows with only a single read
> and write buffer of the max row size. I.e. the minimum requirement for an
> operator would be ((min_buffers -2) * default_buffer_size) + 2 *
> max_row_size. This requires the following changes to the operators:
> BufferedTupleStream changes:
> * Rows <= the default page size are written as before
> * Rows that don't fit in the default page size get written into a larger
> page, with one row per page.
> * Upon writing a large row to an unpinned stream, the page is immediately
> unpinned and we immediately advance to the next write page, so that the large
> page is not kept pinned outside of the AddRow() call.
> * We should only be reading from one unpinned stream at a time, so only one
> large page is required there.
> Sorter changes:
> * Use buffers as large as the largest supported row.
> Testing:
> Needs end-to-end tests exercising all operators with large operators
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)