[ https://issues.apache.org/jira/browse/CRUNCH-619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15477362#comment-15477362 ]
Tom White commented on CRUNCH-619: ---------------------------------- Thanks for taking a look, [~jmhsieh]. There seem to be some APIs that don't exist in both HBase 1 and 2, e.g. CellUtil#createFirstOnRow, and CellComparator#COMPARATOR. Are these going to be backported to HBase 1 to make the transition smoother? There's a comment in HFileOutputFormatForCrunch that explains why the HBase equivalent is not used. I guess that still applies. {quote} HBase's official HFileOutputFormat is not used, because it shuffles on row-key only and does in-memory sort at reducer side (so the size of output HFile is limited to reducer's memory). As crunch supports more complex and flexible MapReduce pipeline, we would prefer thin and pure OutputFormat here. {quote} No reviewboard for Crunch, I'm afraid :( > Run on HBase 2 > -------------- > > Key: CRUNCH-619 > URL: https://issues.apache.org/jira/browse/CRUNCH-619 > Project: Crunch > Issue Type: Improvement > Affects Versions: 0.14.0 > Reporter: Tom White > Assignee: Tom White > Attachments: CRUNCH-619.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)