[
https://issues.apache.org/jira/browse/FLINK-8862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16386135#comment-16386135
]
ASF GitHub Bot commented on FLINK-8862:
---------------------------------------
GitHub user neoremind opened a pull request:
https://github.com/apache/flink/pull/5639
[FLINK-8862] [HBase] Support HBase snapshot read
## What is the purpose of the change
*Flink-hbase connector only supports reading/scanning HBase over region
server scanner, there is also
[snapshot](http://hbase.apache.org/book.html#ops.snapshots) scanning solution,
just like Hadoop provides 2 ways to scan HBase, one is
[TableInputFormat](https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/TableInputFormat.html),
the other is
[TableSnapshotInputFormat](https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/TableSnapshotInputFormat.html),
so it would be great if flink supports both solutions to ensure more wider
usage scope and provide alternatives for users.*
## Brief change log
- *Create `TableInputSplitStrategy` interface and its implementations as
abstraction logic for `AbstractTableInputFormat`*
- *Update `HBaseRowInputFormat` and `TableInputFormat`*
- *Add `HBaseSnapshotRowInputFormat` and `TableSnapshotInputFormat`*
- *Extract 2 interfaces including `HBaseTableScannerAware` and
`ResultToTupleMapper`*
- *Add `HBaseSnapshotReadExample`*
## Verifying this change
This change is already covered by existing tests as follows, and new test
cases has been added as well.
`org.apache.flink.addons.hbase.HBaseConnectorITCase`
This change added tests and can be verified as follows:
- *Manually create one snapshot for a specific HBase table, and use
TableSnapshotInputFormat to do full scan.*
- *Running existing HBaseReadExample to do full scan.*
## Does this pull request potentially affect one of the following parts:
- Dependencies (does it add or upgrade a dependency): (yes / **no**)
- The public API, i.e., is any changed class annotated with
`@Public(Evolving)`: (yes / **no**)
- The serializers: (yes / **no** / don't know)
- The runtime per-record code paths (performance sensitive): (yes /
**no** / don't know)
- Anything that affects deployment or recovery: JobManager (and its
components), Checkpointing, Yarn/Mesos, ZooKeeper: (yes / **no** / don't know)
- The S3 file system connector: (yes / **no** / don't know)
## Documentation
- Does this pull request introduce a new feature? (**yes** / no)
- If yes, how is the feature documented? (not applicable / **docs** /
**JavaDocs** / not documented)
- For document, please visit [JIRA
ticket](https://issues.apache.org/jira/projects/FLINK/issues/FLINK-8862?filter=allopenissues),
a detailed design doc and class diagram have been attached.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/neoremind/flink snapshot
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/flink/pull/5639.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #5639
----
commit 0b36b434f987a971b6463ce3441c483380cfa9dd
Author: neoremind <xuzh1002@...>
Date: 2018-03-05T14:14:09Z
Support HBase snapshot read
----
> Support HBase snapshot read
> ---------------------------
>
> Key: FLINK-8862
> URL: https://issues.apache.org/jira/browse/FLINK-8862
> Project: Flink
> Issue Type: Improvement
> Components: Batch Connectors and Input/Output Formats
> Affects Versions: 1.2.0
> Reporter: Xu Zhang
> Priority: Major
> Attachments: FLINK-8862-Design-Class-Diagram.png,
> FLINK-8862-DesignDoc.pdf
>
>
> Flink-hbase connector only supports reading/scanning HBase over region server
> scanner, there is also snapshot scanning solution, just like Hadoop provides
> 2 ways to scan HBase, one is TableInputFormat, the other is
> TableSnapshotInputFormat, so it would be great if flink supports both
> solutions to ensure more wider usage scope and provide alternatives for users.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)