[
https://issues.apache.org/jira/browse/BEAM-4020?focusedWorklogId=94781&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-94781
]
ASF GitHub Bot logged work on BEAM-4020:
----------------------------------------
Author: ASF GitHub Bot
Created on: 24/Apr/18 19:55
Start Date: 24/Apr/18 19:55
Worklog Time Spent: 10m
Work Description: iemejia commented on a change in pull request #5212:
[BEAM-4020] Add HBaseIO implementation based on SDF
URL: https://github.com/apache/beam/pull/5212#discussion_r183859585
##########
File path:
sdks/java/io/hbase/src/test/java/org/apache/beam/sdk/io/hbase/HBaseIOTest.java
##########
@@ -180,16 +180,28 @@ public void testReadingFailsTableDoesNotExist() throws
Exception {
public void testReadingEmptyTable() throws Exception {
final String table = tmpTable.getName();
createTable(table);
+
runReadTest(HBaseIO.read().withConfiguration(conf).withTableId(table), new
ArrayList<>());
+ runReadTest(
+ HBaseIO.read().withConfiguration(conf).withTableId(table).useSdf(),
new ArrayList<>());
}
@Test
public void testReading() throws Exception {
final String table = tmpTable.getName();
final int numRows = 1001;
- createTable(table);
- writeData(table, numRows);
-
runReadTestLength(HBaseIO.read().withConfiguration(conf).withTableId(table),
1001);
+ createAndWriteData(table, numRows);
+
+
runReadTestLength(HBaseIO.read().withConfiguration(conf).withTableId(table),
numRows);
+ }
+
+ @Test
+ public void testReadingSDF() throws Exception {
Review comment:
```
java.lang.IllegalStateException: Pipeline update will not be possible
because the following transforms do not have stable unique names:
PAssert$0/CreateActual/View.AsSingleton/Combine.GloballyAsSingletonView/Combine.globally(Singleton)/Combine.perKey(Singleton)/GroupByKey,
Counttable_35128c23-5135-4ce4-9a3c-3817ac5a3d99_ByteKeyRange{startKey=[],
endKey=[]}/Combine.perKey(Count)/GroupByKey.
Conflicting instances:
-
name=PAssert$0/CreateActual/View.AsSingleton/Combine.GloballyAsSingletonView/Combine.globally(Singleton)/Combine.perKey(Singleton)/GroupByKey:
- GroupByKey
- GroupByKey
-
name=Counttable_35128c23-5135-4ce4-9a3c-3817ac5a3d99_ByteKeyRange{startKey=[],
endKey=[]}/Combine.perKey(Count)/GroupByKey:
- GroupByKey
- GroupByKey
You can fix it adding a name when you call apply(): pipeline.apply(<name>,
<transform>).
at org.apache.beam.sdk.Pipeline.validate(Pipeline.java:594)
at org.apache.beam.sdk.Pipeline.run(Pipeline.java:310)
at org.apache.beam.sdk.testing.TestPipeline.run(TestPipeline.java:346)
at org.apache.beam.sdk.testing.TestPipeline.run(TestPipeline.java:328)
at
org.apache.beam.sdk.io.hbase.HBaseIOTest.runReadTestLength(HBaseIOTest.java:586)
at
org.apache.beam.sdk.io.hbase.HBaseIOTest.testReading(HBaseIOTest.java:196)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
at
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
at
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at
org.apache.beam.sdk.testing.TestPipeline$1.evaluate(TestPipeline.java:317)
at
org.junit.rules.ExpectedException$ExpectedExceptionStatement.evaluate(ExpectedException.java:239)
at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
at org.junit.rules.RunRules.evaluate(RunRules.java:20)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
at
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
at
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
at
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
at
com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
at
com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47)
at
com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242)
at
com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70)
```
Not sure where this is happening but looks like something internal to
`PAssert`, At keast the test pipeline names all transforms:
```java
private void runReadTestLength(HBaseIO.Read read, long numElements) {
final String transformId = read.getTableId() + "_" + read.getKeyRange();
PCollection<Result> rows = p.apply("Read" + transformId, read);
PAssert.thatSingleton(rows.apply("Count" + transformId,
Count.globally()))
.isEqualTo(numElements);
p.run().waitUntilFinish();
}
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 94781)
Time Spent: 1h 50m (was: 1h 40m)
> Add an HBaseIO implementation based on SDF
> ------------------------------------------
>
> Key: BEAM-4020
> URL: https://issues.apache.org/jira/browse/BEAM-4020
> Project: Beam
> Issue Type: New Feature
> Components: io-java-hbase
> Reporter: Ismaël Mejía
> Assignee: Ismaël Mejía
> Priority: Minor
> Time Spent: 1h 50m
> Remaining Estimate: 0h
>
> Since the support from runners is still limited, it is probably wise to
> create a first IO based on the current SDF batch implementation in Java to
> validate/test it with a real data-store. Since HBase partitioning model is
> quite straightforward it is a perfect candidate.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)