[ https://issues.apache.org/jira/browse/SPARK-2447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14080166#comment-14080166 ]
Ted Malaska commented on SPARK-2447: ------------------------------------ Hey Matei, Lets do a webex or something in the near future. I would love to get more of your input. Here are my answers to you questions above: 1. Yes I can do Python 2. Yes I can do that. So to be clear the bulkGet and scan will return a fixed (Array[Byte], Array[(Array[Byte], Array[Byte], Array[Byte], Long)]) for (rowKey, Array[columnFamily, column, value, timestamp)]) 2.1 As for the bulkPut/Increment/Delete/CheckPut I think we need to give the user freedom to interact with the raw API. I have no problem building a simpler interface for the 80% use case but I don't want to fail the 20%. 3. The lowest version is 0.96 The release is there was a major API change from 0.94 to 0.96+. So if we need to support 0.94 and below we need to make a different code base. Let me know if this answers you questions and let me know if there is anything else I can do. I have learned so much from TD and I have grown so much from this process. Ted Malaska > Add common solution for sending upsert actions to HBase (put, deletes, and > increment) > ------------------------------------------------------------------------------------- > > Key: SPARK-2447 > URL: https://issues.apache.org/jira/browse/SPARK-2447 > Project: Spark > Issue Type: New Feature > Components: Spark Core, Streaming > Reporter: Ted Malaska > Assignee: Ted Malaska > > Going to review the design with Tdas today. > But first thoughts is to have an extension of VoidFunction that handles the > connection to HBase and allows for options such as turning auto flush off for > higher through put. > Need to answer the following questions first. > - Can it be written in Java or should it be written in Scala? > - What is the best way to add the HBase dependency? (will review how Flume > does this as the first option) > - What is the best way to do testing? (will review how Flume does this as the > first option) > - How to support python? (python may be a different Jira it is unknown at > this time) > Goals: > - Simple to use > - Stable > - Supports high load > - Documented (May be in a separate Jira need to ask Tdas) > - Supports Java, Scala, and hopefully Python > - Supports Streaming and normal Spark -- This message was sent by Atlassian JIRA (v6.2#6252)