[
https://issues.apache.org/jira/browse/HBASE-24901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Elliot Miller updated HBASE-24901:
----------------------------------
Description:
As a user, I would like a simple interface for shell output that can be
expressed as a table (ie. output with a fixed number of columns and potentially
many rows). To be clear, this new formatter is not specifically for HBase
"tables." Table is used in the broader sense here.
h2. Goals
- Do not require more than one output cell loaded in memory at a time
- Support many implementations like aligned human-friendly tables, unaligned
delimited, and JSON
h2. Non-goals
- Don't load all the headers into memory at once.
** This may seem like a goal with merit, but we are unlikely to find a use
case for this formatter with many columns. For example: since HBase tables
aren't relational, our scan output will not have an output column for every
HBase column. Instead, each output row will correspond to an HBase cell.
** It's also really useful to have the headers ahead of time, because it
allows us to do things like JSON object output (where each row is represented
with key-value pairs).
h2. Implementation
This patch was implemented as a stateful output formatter for data with a fixed
number of output columns. Tracking state inside the formatter is an important
design feature so that we don't have to feed the formatter all the data at once.
h2. Formatter Usage Pattern
The verbose way to use the formatter to print a table is as follows:
1. call start_table to reset the formatter's state and pass configuration
options
2. call start_row to start writing a row
3. call cell to write a single cell
4. call close_row
5. call close_table
Sometimes, it will feel like this is a lot of method calls, but these calls act
as "hooks"
and give each of the formatter implementations a chance to fill out all the
content necessary
between cells. To cut down on boilerplate, there are shortcut methods like row
and single_value_table.
was:
As a user, I would like a simple interface for shell output that can be
expressed as a table (ie. output with a fixed number of columns and potentially
many rows). To be clear, this new formatter is not specifically for HBase
"tables." Table is used in the broader sense here.
Goals
- Do not require more than one output cell loaded in memory at a time
- Support many implementations like aligned human-friendly tables, unaligned
delimited, and JSON
Non-goals
- Don't load all the headers into memory at once.
- This may seem like a goal with merit, but we are unlikely to find a use
case for this formatter with many columns. For example: since HBase tables
aren't relational, our scan output will not have an output column for every
HBase column. Instead, each output row will correspond to an HBase cell.
> Create versatile hbase-shell table formatter
> --------------------------------------------
>
> Key: HBASE-24901
> URL: https://issues.apache.org/jira/browse/HBASE-24901
> Project: HBase
> Issue Type: Improvement
> Components: shell
> Affects Versions: 3.0.0-alpha-1
> Reporter: Elliot Miller
> Assignee: Elliot Miller
> Priority: Major
> Attachments: HBASE-24901_scan_output_comparison.png
>
>
> As a user, I would like a simple interface for shell output that can be
> expressed as a table (ie. output with a fixed number of columns and
> potentially many rows). To be clear, this new formatter is not specifically
> for HBase "tables." Table is used in the broader sense here.
> h2. Goals
> - Do not require more than one output cell loaded in memory at a time
> - Support many implementations like aligned human-friendly tables, unaligned
> delimited, and JSON
> h2. Non-goals
> - Don't load all the headers into memory at once.
> ** This may seem like a goal with merit, but we are unlikely to find a use
> case for this formatter with many columns. For example: since HBase tables
> aren't relational, our scan output will not have an output column for every
> HBase column. Instead, each output row will correspond to an HBase cell.
> ** It's also really useful to have the headers ahead of time, because it
> allows us to do things like JSON object output (where each row is represented
> with key-value pairs).
> h2. Implementation
> This patch was implemented as a stateful output formatter for data with a
> fixed number of output columns. Tracking state inside the formatter is an
> important design feature so that we don't have to feed the formatter all the
> data at once.
> h2. Formatter Usage Pattern
> The verbose way to use the formatter to print a table is as follows:
> 1. call start_table to reset the formatter's state and pass configuration
> options
> 2. call start_row to start writing a row
> 3. call cell to write a single cell
> 4. call close_row
> 5. call close_table
> Sometimes, it will feel like this is a lot of method calls, but these calls
> act as "hooks"
> and give each of the formatter implementations a chance to fill out all the
> content necessary
> between cells. To cut down on boilerplate, there are shortcut methods like
> row and single_value_table.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)