[
https://issues.apache.org/jira/browse/HBASE-15482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15202539#comment-15202539
]
Liyin Tang commented on HBASE-15482:
------------------------------------
Dave, thanks for the response.
Even we use HDFS snapshots, it will be great to have an option to skip
calculating block locations. To decouple computing with storage , it is
possible to set up computing layer for query engine like Spark/Hive/Presto in a
different cluster. In these cases, the locality doesn't matter for both HBase
and HDFS snapshots.
> Provide an option to skip calculating block locations for SnapshotInputFormat
> -----------------------------------------------------------------------------
>
> Key: HBASE-15482
> URL: https://issues.apache.org/jira/browse/HBASE-15482
> Project: HBase
> Issue Type: Improvement
> Components: mapreduce
> Reporter: Liyin Tang
> Priority: Minor
>
> When a MR job is reading from SnapshotInputFormat, it needs to calculate the
> splits based on the block locations in order to get best locality. However,
> this process may take a long time for large snapshots.
> In some setup, the computing layer, Spark, Hive or Presto could run out side
> of HBase cluster. In these scenarios, the block locality doesn't matter.
> Therefore, it will be great to have an option to skip calculating the block
> locations for every job. That will super useful for the Hive/Presto/Spark
> connectors.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)