wuchong commented on a change in pull request #9799:
[FLINK-13360][documentation] Add documentation for HBase connector for Table
API & SQL
URL: https://github.com/apache/flink/pull/9799#discussion_r332862569
##########
File path: docs/dev/table/connect.md
##########
@@ -1075,6 +1075,71 @@ CREATE TABLE MyUserTable (
{% top %}
+### HBase Connector
+
+<span class="label label-primary">Source: Batch</span>
+<span class="label label-primary">Sink: Batch</span>
+<span class="label label-primary">Sink: Streaming Append Mode</span>
+<span class="label label-primary">Sink: Streaming Upsert Mode</span>
+<span class="label label-primary">Temporal Join: Sync Mode</span>
+
+The HBase connector allows for reading from and writing to an HBase cluster.
+
+The connector can operate in [upsert mode](#update-modes) for exchanging
UPSERT/DELETE messages with the external system using a [key defined by the
query](./streaming/dynamic_tables.html#table-to-stream-conversion).
+
+For append-only queries, the connector can also operate in [append
mode](#update-modes) for exchanging only INSERT messages with the external
system.
+
+To use this connector, add the following dependency to your project:
+
+{% highlight xml %}
+<dependency>
+ <groupId>org.apache.flink</groupId>
+ <artifactId>flink-connector-hbase{{ site.scala_version_suffix }}</artifactId>
+ <version>{{ site.version }}</version>
+</dependency>
+{% endhighlight %}
+
+The connector can be defined as follows:
+
+<div data-lang="DDL" markdown="1">
+{% highlight sql %}
+CREATE TABLE MyUserTable (
+ hbase_rowkey_name rowkey_type,
+ hbase_column_family_name1 ROW<...>,
+ hbase_column_family_name2 ROW<...>
+) WITH (
+ 'connector.type' = 'hbase', -- required: specify this table type is hbase
+
+ 'connector.version' = '1.4.3', -- required: valid connector
versions are "1.4.3"
+
+ 'connector.table-name' = 'hbase_table_name', -- required: hbase table name
+
+ 'connector.zookeeper.quorum' = 'quorum_url', -- required: hbase zookeeper
config
+ 'connector.zookeeper.znode.parent' = 'znode',
+
+ 'connector.write.buffer-flush.max-size' = '1048576', -- optional: Write
option, sets when to flush a buffered request
+ -- based on the memory
size of rows currently added.
+
+ 'connector.write.buffer-flush.max-rows' = '1', -- optional: Write option,
sets when to flush buffered
+ -- request based on the
number of rows currently added.
+
+ 'connector.write.buffer-flush.interval' = '1', -- optional: Write option,
sets a flush interval flushing buffered
+ -- requesting if the interval
passes, in milliseconds.
+)
+{% endhighlight %}
+</div>
+</div>
+
+**Column family:** Values other than `rowKey` must be declared as column
families, and all column family values must be wrapped with the SQL ROW
function before being inserted into HBase table.
Review comment:
```suggestion
**Columns:** All the column families in HBase table must be declared as
`ROW` type, the field name maps to the column family name, and the nested field
names map to the column qualifier names. There is no need to declare all the
families and qualifiers in the schema, users can declare what's necessary.
Except the `ROW` type fields, the only one field of atomic type (e.g. `STRING`,
`BIGINT`) will be recognized as row key of the table. There's no constraints on
the name of row key field.
```
I rephrased this sentence. What do you think about it?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services