[jira] [Commented] (HBASE-8369) MapReduce over snapshot files

Andrew Purtell (JIRA) Thu, 12 Dec 2013 16:45:05 -0800

    [ 
https://issues.apache.org/jira/browse/HBASE-8369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13846981#comment-13846981
 ]


Andrew Purtell commented on HBASE-8369:
---------------------------------------

bq. Like an appropriate HRegion constructor, etc.

Something like that could go into 0.96 surely. [~stack] ?

bq. We are going to have to start to go down the Facebook path of forking HBase 
for things like this then and our contribution will become less useful over 
time. So be it.

To put this politely (I have strong opinions) the FB fork was a matter of a 
tight internal deployment schedule as opposed to any unwillingness of the 
community to work with their contributions. 

The addition of a couple of extra classes to a private build does not make a 
fork, just like the enhancements to reduce byte copies for "smart clients" that 
Jesse was working that went mainly into Phoenix didn't produce a fork. If and 
when the time comes that truly an incompatible change must be introduced that 
constitutes a real break, we should definitely look hard at that.


> MapReduce over snapshot files
> -----------------------------
>
>                 Key: HBASE-8369
>                 URL: https://issues.apache.org/jira/browse/HBASE-8369
>             Project: HBase
>          Issue Type: New Feature
>          Components: mapreduce, snapshots
>            Reporter: Enis Soztutar
>            Assignee: Enis Soztutar
>             Fix For: 0.98.0
>
>         Attachments: HBASE-8369-0.94.patch, HBASE-8369-0.94_v2.patch, 
> HBASE-8369-0.94_v3.patch, HBASE-8369-0.94_v4.patch, HBASE-8369-0.94_v5.patch, 
> HBASE-8369-trunk_v1.patch, HBASE-8369-trunk_v2.patch, 
> HBASE-8369-trunk_v3.patch, hbase-8369_v0.patch, hbase-8369_v11.patch, 
> hbase-8369_v5.patch, hbase-8369_v6.patch, hbase-8369_v7.patch, 
> hbase-8369_v8.patch, hbase-8369_v9.patch
>
>
> The idea is to add an InputFormat, which can run the mapreduce job over 
> snapshot files directly bypassing hbase server layer. The IF is similar in 
> usage to TableInputFormat, taking a Scan object from the user, but instead of 
> running from an online table, it runs from a table snapshot. We do one split 
> per region in the snapshot, and open an HRegion inside the RecordReader. A 
> RegionScanner is used internally for doing the scan without any HRegionServer 
> bits. 
> Users have been asking and searching for ways to run MR jobs by reading 
> directly from hfiles, so this allows new use cases if reading from stale data 
> is ok:
>  - Take snapshots periodically, and run MR jobs only on snapshots.
>  - Export snapshots to remote hdfs cluster, run the MR jobs at that cluster 
> without HBase cluster.
>  - (Future use case) Combine snapshot data with online hbase data: Scan from 
> yesterday's snapshot, but read today's data from online hbase cluster. 



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Commented] (HBASE-8369) MapReduce over snapshot files

Reply via email to