[jira] [Updated] (HBASE-24901) Create versatile hbase-shell table formatter

Elliot Miller (Jira) Wed, 19 Aug 2020 13:21:19 -0700


     [ 
https://issues.apache.org/jira/browse/HBASE-24901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Elliot Miller updated HBASE-24901:
----------------------------------
    Description: 
As a user, I would like a simple interface for shell output that can be 
expressed as a table (ie. output with a fixed number of columns and potentially 
many rows). To be clear, this new formatter is not specifically for HBase 
"tables." Table is used in the broader sense here.
h2. Goals
 - Do not require more than one output cell loaded in memory at a time
 - Support many implementations like aligned human-friendly tables, unaligned 
delimited, and JSON

h2. Non-goals
 - Don't load all the headers into memory at once.
 ** This may seem like a goal with merit, but we are unlikely to find a use 
case for this formatter with many columns. For example: since HBase tables 
aren't relational, our scan output will not have an output column for every 
HBase column. Instead, each output row will correspond to an HBase cell.
 ** It's also really useful to have the headers ahead of time, because it 
allows us to do things like JSON object output (where each row is represented 
with key-value pairs).

h2. Implementation

This patch was implemented as a stateful output formatter for data with a fixed 
number of output columns. Tracking state inside the formatter is an important 
design feature so that we don't have to feed the formatter all the data at once.
h2. Formatter Usage Pattern

The verbose way to use the formatter to print a table is as follows:
1. call start_table to reset the formatter's state and pass configuration 
options
2. call start_row to start writing a row
3. call cell to write a single cell
4. call close_row
5. call close_table

Sometimes, it will feel like this is a lot of method calls, but these calls act 
as "hooks"
and give each of the formatter implementations a chance to fill out all the 
content necessary
between cells. To cut down on boilerplate, there are shortcut methods like row 
and single_value_table.

  was:
As a user, I would like a simple interface for shell output that can be 
expressed as a table (ie. output with a fixed number of columns and potentially 
many rows). To be clear, this new formatter is not specifically for HBase 
"tables." Table is used in the broader sense here.

Goals
- Do not require more than one output cell loaded in memory at a time
- Support many implementations like aligned human-friendly tables, unaligned 
delimited, and JSON

Non-goals
- Don't load all the headers into memory at once.
  - This may seem like a goal with merit, but we are unlikely to find a use 
case for this formatter with many columns. For example: since HBase tables 
aren't relational, our scan output will not have an output column for every 
HBase column. Instead, each output row will correspond to an HBase cell.


> Create versatile hbase-shell table formatter
> --------------------------------------------
>
>                 Key: HBASE-24901
>                 URL: https://issues.apache.org/jira/browse/HBASE-24901
>             Project: HBase
>          Issue Type: Improvement
>          Components: shell
>    Affects Versions: 3.0.0-alpha-1
>            Reporter: Elliot Miller
>            Assignee: Elliot Miller
>            Priority: Major
>         Attachments: HBASE-24901_scan_output_comparison.png
>
>
> As a user, I would like a simple interface for shell output that can be 
> expressed as a table (ie. output with a fixed number of columns and 
> potentially many rows). To be clear, this new formatter is not specifically 
> for HBase "tables." Table is used in the broader sense here.
> h2. Goals
>  - Do not require more than one output cell loaded in memory at a time
>  - Support many implementations like aligned human-friendly tables, unaligned 
> delimited, and JSON
> h2. Non-goals
>  - Don't load all the headers into memory at once.
>  ** This may seem like a goal with merit, but we are unlikely to find a use 
> case for this formatter with many columns. For example: since HBase tables 
> aren't relational, our scan output will not have an output column for every 
> HBase column. Instead, each output row will correspond to an HBase cell.
>  ** It's also really useful to have the headers ahead of time, because it 
> allows us to do things like JSON object output (where each row is represented 
> with key-value pairs).
> h2. Implementation
> This patch was implemented as a stateful output formatter for data with a 
> fixed number of output columns. Tracking state inside the formatter is an 
> important design feature so that we don't have to feed the formatter all the 
> data at once.
> h2. Formatter Usage Pattern
> The verbose way to use the formatter to print a table is as follows:
> 1. call start_table to reset the formatter's state and pass configuration 
> options
> 2. call start_row to start writing a row
> 3. call cell to write a single cell
> 4. call close_row
> 5. call close_table
> Sometimes, it will feel like this is a lot of method calls, but these calls 
> act as "hooks"
> and give each of the formatter implementations a chance to fill out all the 
> content necessary
> between cells. To cut down on boilerplate, there are shortcut methods like 
> row and single_value_table.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HBASE-24901) Create versatile hbase-shell table formatter

Reply via email to