[
https://issues.apache.org/jira/browse/HIVE-7224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15363688#comment-15363688
]
Thejas M Nair commented on HIVE-7224:
-------------------------------------
bq. To clarify, what would happen if Beeline uses the first 1000 rows to
calculate the width, but then row 1001th is longer than that width.
If 1001th row has column larger than the precomputed column width, that
particular row would have the column with larger width to accommodate it. This
would mean some rows have the separator "|" out of alignment with previous row.
However, even if we recompute every 1000 rows, we could still have
misalignment every 1000 rows.
I looked at where the Row width gets used. The width is getting used only when
--outputformat=table (ie TableOutputFormat class) is used .
If someone is working on very large outputs, it is likely to be processed by
other applications and not human eyes, and a *sv (eg csv) format is likely to
be used. It doesn't make any sense waste cpu cycles computing the width in
those cases. This is also the case where performance impact of this computation
would be more visible.
ie, If we can selectively enable buffering and width calculation only for
TableOutputFormat, I don't think it would matter if we stick to column width
based on first 1000 rows or recompute every 1000 rows.
Looks like the Row subclasses have access to beeline options and would be able
to determine what the output format is.
> Set incremental printing to true by default in Beeline
> ------------------------------------------------------
>
> Key: HIVE-7224
> URL: https://issues.apache.org/jira/browse/HIVE-7224
> Project: Hive
> Issue Type: Bug
> Components: Beeline, Clients, JDBC
> Affects Versions: 0.13.0, 1.0.0, 1.2.0, 1.1.0
> Reporter: Vaibhav Gumashta
> Assignee: Sahil Takiar
> Attachments: HIVE-7224.1.patch, HIVE-7224.2.patch, HIVE-7224.2.patch,
> HIVE-7224.3.patch
>
>
> See HIVE-7221.
> By default beeline tries to buffer the entire output relation before printing
> it on stdout. This can cause OOM when the output relation is large. However,
> beeline has the option of incremental prints. We should keep that as the
> default.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)