[jira] [Commented] (HBASE-14789) Provide an alternative spark-hbase connector

Ted Malaska (JIRA) Tue, 10 Nov 2015 14:05:38 -0800

    [ 
https://issues.apache.org/jira/browse/HBASE-14789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14999422#comment-14999422
 ]


Ted Malaska commented on HBASE-14789:
-------------------------------------

Cool I read the doc so there are two points.

* Bulk Get - Do bulk Gets on an executor
* TableInputFormat - Don't use this because or the thought that only one can 
run at a time
* Change the table description format - Add more JSON like definition
* Add write support - For SparkSQL writes to HBase

#First lets talk to each point first:
* Bulk Get: - As we have talked about in other jira's executing this on the 
executor side really doesn't add much value.  It would be vary odd if people 
would have more then a 1000 equals in a where cause.  If they did then we need 
to figure out at what point 1000, 10000, 50000 does it become faster to run the 
code on the executor.  The normal use case is just a couple = per where cause 
so this is not a real concern, now if you want to do a real bulk get then use 
the bulk get command, that will be much better for a lot of reasons.

* Not Using TableInputFormat: In the code today Spark if given the 
TablInputFormat in different requests so they are at different points on the 
DAG.  So why does Spark not read from both?  Also the locality is given and we 
are not reinventing the wheel.

* Change the table description format: This is a preference thing is current 
version is more like the HBase shell.  Ether way makes sense it makes no real 
difference.

* Add write support: Yes we should add this.  

#Summery
First I think any and all changes would fit into the current implementation of 
the HBase-Spark module with little changes.  This are pretty pointed changes 
that effect a scoped area of the code.  

Second we should separate out this jira into 4 different jiras each focusing on 
the different points, for these different points are not dependent or related. 
We should open up a jira to address each features and then discuss the approach 
for each one and how it can be added and or if it should be added.

Thanks Zhan

Let me know if I missed anything

 



> Provide an alternative spark-hbase connector
> --------------------------------------------
>
>                 Key: HBASE-14789
>                 URL: https://issues.apache.org/jira/browse/HBASE-14789
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Zhan Zhang
>            Assignee: Zhan Zhang
>         Attachments: shc.pdf
>
>
> This JIRA is to provide user an option to choose different Spark-HBase 
> implementation based on requirements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-14789) Provide an alternative spark-hbase connector

Reply via email to