[ https://issues.apache.org/jira/browse/HBASE-14789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14999422#comment-14999422 ]
Ted Malaska commented on HBASE-14789: ------------------------------------- Cool I read the doc so there are two points. * Bulk Get - Do bulk Gets on an executor * TableInputFormat - Don't use this because or the thought that only one can run at a time * Change the table description format - Add more JSON like definition * Add write support - For SparkSQL writes to HBase #First lets talk to each point first: * Bulk Get: - As we have talked about in other jira's executing this on the executor side really doesn't add much value. It would be vary odd if people would have more then a 1000 equals in a where cause. If they did then we need to figure out at what point 1000, 10000, 50000 does it become faster to run the code on the executor. The normal use case is just a couple = per where cause so this is not a real concern, now if you want to do a real bulk get then use the bulk get command, that will be much better for a lot of reasons. * Not Using TableInputFormat: In the code today Spark if given the TablInputFormat in different requests so they are at different points on the DAG. So why does Spark not read from both? Also the locality is given and we are not reinventing the wheel. * Change the table description format: This is a preference thing is current version is more like the HBase shell. Ether way makes sense it makes no real difference. * Add write support: Yes we should add this. #Summery First I think any and all changes would fit into the current implementation of the HBase-Spark module with little changes. This are pretty pointed changes that effect a scoped area of the code. Second we should separate out this jira into 4 different jiras each focusing on the different points, for these different points are not dependent or related. We should open up a jira to address each features and then discuss the approach for each one and how it can be added and or if it should be added. Thanks Zhan Let me know if I missed anything > Provide an alternative spark-hbase connector > -------------------------------------------- > > Key: HBASE-14789 > URL: https://issues.apache.org/jira/browse/HBASE-14789 > Project: HBase > Issue Type: Bug > Reporter: Zhan Zhang > Assignee: Zhan Zhang > Attachments: shc.pdf > > > This JIRA is to provide user an option to choose different Spark-HBase > implementation based on requirements. -- This message was sent by Atlassian JIRA (v6.3.4#6332)