[ 
https://issues.apache.org/jira/browse/CRUNCH-619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15477362#comment-15477362
 ] 

Tom White commented on CRUNCH-619:
----------------------------------

Thanks for taking a look, [~jmhsieh].

There seem to be some APIs that don't exist in both HBase 1 and 2, e.g. 
CellUtil#createFirstOnRow, and CellComparator#COMPARATOR. Are these going to be 
backported to HBase 1 to make the transition smoother?

There's a comment in HFileOutputFormatForCrunch that explains why the HBase 
equivalent is not used. I guess that still applies.

{quote}
HBase's official HFileOutputFormat is not used, because it shuffles on row-key 
only and
does in-memory sort at reducer side (so the size of output HFile is limited to 
reducer's memory).
As crunch supports more complex and flexible MapReduce pipeline, we would 
prefer thin and pure
OutputFormat here.
{quote}

No reviewboard for Crunch, I'm afraid :(

> Run on HBase 2
> --------------
>
>                 Key: CRUNCH-619
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-619
>             Project: Crunch
>          Issue Type: Improvement
>    Affects Versions: 0.14.0
>            Reporter: Tom White
>            Assignee: Tom White
>         Attachments: CRUNCH-619.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to