[jira] [Work logged] (BEAM-4020) Add an HBaseIO implementation based on SDF

ASF GitHub Bot (JIRA) Tue, 24 Apr 2018 12:56:52 -0700

     [ 
https://issues.apache.org/jira/browse/BEAM-4020?focusedWorklogId=94781&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-94781
 ]


ASF GitHub Bot logged work on BEAM-4020:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 24/Apr/18 19:55
            Start Date: 24/Apr/18 19:55
    Worklog Time Spent: 10m 
      Work Description: iemejia commented on a change in pull request #5212: 
[BEAM-4020] Add HBaseIO implementation based on SDF
URL: https://github.com/apache/beam/pull/5212#discussion_r183859585
 
 

 ##########
 File path: 
sdks/java/io/hbase/src/test/java/org/apache/beam/sdk/io/hbase/HBaseIOTest.java
 ##########
 @@ -180,16 +180,28 @@ public void testReadingFailsTableDoesNotExist() throws 
Exception {
   public void testReadingEmptyTable() throws Exception {
     final String table = tmpTable.getName();
     createTable(table);
+
     runReadTest(HBaseIO.read().withConfiguration(conf).withTableId(table), new 
ArrayList<>());
+    runReadTest(
+        HBaseIO.read().withConfiguration(conf).withTableId(table).useSdf(), 
new ArrayList<>());
   }
 
   @Test
   public void testReading() throws Exception {
     final String table = tmpTable.getName();
     final int numRows = 1001;
-    createTable(table);
-    writeData(table, numRows);
-    
runReadTestLength(HBaseIO.read().withConfiguration(conf).withTableId(table), 
1001);
+    createAndWriteData(table, numRows);
+
+    
runReadTestLength(HBaseIO.read().withConfiguration(conf).withTableId(table), 
numRows);
+  }
+
+  @Test
+  public void testReadingSDF() throws Exception {
 
 Review comment:
   ```
   java.lang.IllegalStateException: Pipeline update will not be possible 
because the following transforms do not have stable unique names: 
PAssert$0/CreateActual/View.AsSingleton/Combine.GloballyAsSingletonView/Combine.globally(Singleton)/Combine.perKey(Singleton)/GroupByKey,
 Counttable_35128c23-5135-4ce4-9a3c-3817ac5a3d99_ByteKeyRange{startKey=[], 
endKey=[]}/Combine.perKey(Count)/GroupByKey.
   
   Conflicting instances:
   - 
name=PAssert$0/CreateActual/View.AsSingleton/Combine.GloballyAsSingletonView/Combine.globally(Singleton)/Combine.perKey(Singleton)/GroupByKey:
       - GroupByKey
       - GroupByKey
   - 
name=Counttable_35128c23-5135-4ce4-9a3c-3817ac5a3d99_ByteKeyRange{startKey=[], 
endKey=[]}/Combine.perKey(Count)/GroupByKey:
       - GroupByKey
       - GroupByKey
   
   You can fix it adding a name when you call apply(): pipeline.apply(<name>, 
<transform>).
   
        at org.apache.beam.sdk.Pipeline.validate(Pipeline.java:594)
        at org.apache.beam.sdk.Pipeline.run(Pipeline.java:310)
        at org.apache.beam.sdk.testing.TestPipeline.run(TestPipeline.java:346)
        at org.apache.beam.sdk.testing.TestPipeline.run(TestPipeline.java:328)
        at 
org.apache.beam.sdk.io.hbase.HBaseIOTest.runReadTestLength(HBaseIOTest.java:586)
        at 
org.apache.beam.sdk.io.hbase.HBaseIOTest.testReading(HBaseIOTest.java:196)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
        at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
        at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
        at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
        at 
org.apache.beam.sdk.testing.TestPipeline$1.evaluate(TestPipeline.java:317)
        at 
org.junit.rules.ExpectedException$ExpectedExceptionStatement.evaluate(ExpectedException.java:239)
        at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
        at org.junit.rules.RunRules.evaluate(RunRules.java:20)
        at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
        at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
        at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
        at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
        at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
        at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
        at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
        at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
        at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
        at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
        at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
        at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
        at 
com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
        at 
com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47)
        at 
com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242)
        at 
com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70)
   ```
   Not sure where this is happening but looks like something internal to 
`PAssert`, At keast the test pipeline names all transforms:
   ```java
     private void runReadTestLength(HBaseIO.Read read, long numElements) {
       final String transformId = read.getTableId() + "_" + read.getKeyRange();
       PCollection<Result> rows = p.apply("Read" + transformId, read);
       PAssert.thatSingleton(rows.apply("Count" + transformId, 
Count.globally()))
           .isEqualTo(numElements);
       p.run().waitUntilFinish();
     }
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 94781)
    Time Spent: 1h 50m  (was: 1h 40m)

> Add an HBaseIO implementation based on SDF
> ------------------------------------------
>
>                 Key: BEAM-4020
>                 URL: https://issues.apache.org/jira/browse/BEAM-4020
>             Project: Beam
>          Issue Type: New Feature
>          Components: io-java-hbase
>            Reporter: Ismaël Mejía
>            Assignee: Ismaël Mejía
>            Priority: Minor
>          Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Since the support from runners is still limited, it is probably wise to 
> create a first IO based on the current SDF batch implementation in Java to 
> validate/test it with a real data-store. Since HBase partitioning model is 
> quite straightforward it is a perfect candidate.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-4020) Add an HBaseIO implementation based on SDF

Reply via email to