[
https://issues.apache.org/jira/browse/FLINK-2188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14580282#comment-14580282
]
Ufuk Celebi commented on FLINK-2188:
------------------------------------
You can try it with this branch:
https://github.com/uce/incubator-flink/tree/configurable_if-2195
The following code snippet should allow you to adjust your example.
{code}
public static void main(String[] args) throws Exception {
ExecutionEnvironment env = ExecutionEnvironment.createLocalEnvironment(4);
Configuration conf = new Configuration();
conf.set(TableInputFormat.INPUT_TABLE, "test");
DataSource<Tuple2<ImmutableBytesWritable, Result>> hbase =
env.createHadoopInput(
new TableInputFormat(),
ImmutableBytesWritable.class,
Result.class,
Job.getInstance(conf));
DataSet<Tuple2<String, String>> toTuple = hbase.map(
new MapFunction<Tuple2<ImmutableBytesWritable, Result>, Tuple2<String,
String>>() {
public Tuple2<String, String> map(Tuple2<ImmutableBytesWritable,
Result> record) throws Exception {
Result result = record.f1;
return new Tuple2<String, String>(
Bytes.toString(result.getRow()),
new String(result.value()));
}
});
System.out.println(toTuple.count());
}
{code}
> Reading from big HBase Tables
> -----------------------------
>
> Key: FLINK-2188
> URL: https://issues.apache.org/jira/browse/FLINK-2188
> Project: Flink
> Issue Type: Bug
> Reporter: Hilmi Yildirim
> Priority: Critical
> Attachments: flinkTest.zip
>
>
> I detected a bug in the reading from a big Hbase Table.
> I used a cluster of 13 machines with 13 processing slots for each machine
> which results in a total number of processing slots of 169. Further, our
> cluster uses cdh5.4.1 and the HBase version is 1.0.0-cdh5.4.1. There is a
> Hbase Table with nearly 100. mio rows. I used Spark and Hive to count the
> number of rows and both results are identical (nearly 100 mio.).
> Then, I used Flink to count the number of rows. For that I added the
> hbase-client 1.0.0-cdh5.4.1 Java API as dependency in maven and excluded the
> other hbase-client dependencies. The result of the job is nearly 102 mio. , 2
> mio. rows more than the result of Spark and Hive. Moreover, I run the Flink
> job multiple times and sometimes the result fluctuates by +-5.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)