[ 
https://issues.apache.org/jira/browse/HIVE-7224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15363688#comment-15363688
 ] 

Thejas M Nair commented on HIVE-7224:
-------------------------------------

bq. To clarify, what would happen if Beeline uses the first 1000 rows to 
calculate the width, but then row 1001th is longer than that width. 
If 1001th row has column larger than the precomputed column width, that 
particular row would have the column with larger width to accommodate it. This 
would mean some rows have the separator "|" out of alignment with previous row. 
However, even if we recompute every 1000 rows, we could still  have 
misalignment every 1000 rows.

I looked at where the Row width gets used. The width is getting used only when 
--outputformat=table (ie TableOutputFormat class) is used .
If someone is working on very large outputs, it is likely to be processed by 
other applications and not human eyes, and a *sv (eg csv) format is likely to 
be used. It doesn't make any sense waste cpu cycles computing the width in 
those cases. This is also the case where performance impact of this computation 
would be more visible.

ie, If we can selectively enable buffering and width calculation only for 
TableOutputFormat, I don't think it would matter if we stick to column width 
based on first 1000 rows or recompute every 1000 rows.
Looks like the Row subclasses have access to beeline options and would be able 
to determine what the output format is.


> Set incremental printing to true by default in Beeline
> ------------------------------------------------------
>
>                 Key: HIVE-7224
>                 URL: https://issues.apache.org/jira/browse/HIVE-7224
>             Project: Hive
>          Issue Type: Bug
>          Components: Beeline, Clients, JDBC
>    Affects Versions: 0.13.0, 1.0.0, 1.2.0, 1.1.0
>            Reporter: Vaibhav Gumashta
>            Assignee: Sahil Takiar
>         Attachments: HIVE-7224.1.patch, HIVE-7224.2.patch, HIVE-7224.2.patch, 
> HIVE-7224.3.patch
>
>
> See HIVE-7221.
> By default beeline tries to buffer the entire output relation before printing 
> it on stdout. This can cause OOM when the output relation is large. However, 
> beeline has the option of incremental prints. We should keep that as the 
> default.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to