[ 
https://issues.apache.org/jira/browse/CRUNCH-614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ben Roling updated CRUNCH-614:
------------------------------
    Attachment: CRUNCH-614-1.patch

Attaching the fairly obvious patch to use the KeyValue(byte[], int, int) 
constructor in the KeyValueComparator.compare() methods.

The integration tests pass and the change has the expected effect of 
dramatically speeding up the job that originally caused me to look into this 
issue.  The task that was taking hours before completes in under a minute.

I'm not sure if I should have changed the implementation of 
HBaseTypes.bytesToKeyValue() or changed any of the other places that use that 
method?

On the mailing list [~joshwills] said the following:
{quote}
...it looks like we were consolidating some common patterns in the code that 
had different use cases (i.e., defensive copies on reads vs. not doing that on 
sorts.)...
{quote}

If defensive copies on reads are desired or required then perhaps the other 
code shouldn't be touched.  The other uses of bytesToKeyValue() are in the 
PTypes defined by HBaseTypes.cells() and HBaseTypes.keyValues().

Josh (or others) - do you have any more feedback?

> HFileUtils.writeToHFilesForIncrementalLoad slowed dramatically by copying 
> KeyValue byte array
> ---------------------------------------------------------------------------------------------
>
>                 Key: CRUNCH-614
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-614
>             Project: Crunch
>          Issue Type: Bug
>    Affects Versions: 0.11.0, 0.12.0, 0.13.0, 0.14.0
>            Reporter: Ben Roling
>         Attachments: CRUNCH-614-1.patch
>
>
> I raised this issue on the mailing list:
> http://mail-archives.apache.org/mod_mbox/crunch-user/201607.mbox/%3CCANBdsh01qaQRCNdQdtqytP%2BWAhT_NVGHyQAdDS8H%2BPPMfi9bkw%40mail.gmail.com%3E
> HFileUtils was changed in such a way that it makes a copy of the KeyValue 
> byte array in the compare() method of the KeyValueComparator.  The change was 
> made with the following commit:
> https://github.com/apache/crunch/commit/a959ee6c7fc400d1f455b0742641c54de1dec0bf#diff-bc76ce0b41704c9c4efbfa1aab53588d
> The change causes HFileUtils.writeToHFilesForIncrementalLoad to be 
> dramatically slower in at least some cases.
> The code changed from using the KeyValue(byte[], int, int) constructor to 
> using KeyValue.create().  KeyValue.create() does a byte array copy.  The fix 
> is likely as simple as changing the code back to using the KeyValue 
> constructor.
> I will do some testing an attach a PR for the fix.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to