[
https://issues.apache.org/jira/browse/HIVE-7224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15363001#comment-15363001
]
Sahil Takiar commented on HIVE-7224:
------------------------------------
[~vgumashta] is seems the behavior you are seeing is by design. Looking at
https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients#HiveServer2Clients-BeelineCommandOptions
the following explanation of the {{--incremental}} property suggests that this
is expected:
{quote}
Defaults to false. When set to false, the entire result set is fetched and
buffered before being displayed, yielding optimal display column sizing. When
set to true, result rows are displayed immediately as they are fetched,
yielding lower latency and memory usage at the price of extra display column
padding. Setting --incremental=true is recommended if you encounter an
OutOfMemory on the client side (due to the fetched result set size being large).
{quote}
So it seems there is a tradeoff when using {{--incremental}} that the column
padding won't be optimal, but memory usage will be better. This makes sense
since the {{IncrementalRows}} class that controls this logic doesn't do any
buffering of rows, so it cannot predict what the optimal column width should be
since it only looks at one row at a time.
I think a better approach for the {{IncrementalRows}} class would be to instead
buffer 1000 rows at a time (by default, this value can be configurable), this
way it can optimally set the column width for each set of 1000 rows. This
shouldn't introduce memory issues unless each row is huge, in which case the
use can decrease the buffer size to say 100 or 10.
What do you think?
> Set incremental printing to true by default in Beeline
> ------------------------------------------------------
>
> Key: HIVE-7224
> URL: https://issues.apache.org/jira/browse/HIVE-7224
> Project: Hive
> Issue Type: Bug
> Components: Beeline, Clients, JDBC
> Affects Versions: 0.13.0, 1.0.0, 1.2.0, 1.1.0
> Reporter: Vaibhav Gumashta
> Assignee: Sahil Takiar
> Attachments: HIVE-7224.1.patch, HIVE-7224.2.patch, HIVE-7224.2.patch,
> HIVE-7224.3.patch
>
>
> See HIVE-7221.
> By default beeline tries to buffer the entire output relation before printing
> it on stdout. This can cause OOM when the output relation is large. However,
> beeline has the option of incremental prints. We should keep that as the
> default.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)