[ https://issues.apache.org/jira/browse/CRUNCH-614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ben Roling updated CRUNCH-614: ------------------------------ Attachment: CRUNCH-614-1.patch Attaching the fairly obvious patch to use the KeyValue(byte[], int, int) constructor in the KeyValueComparator.compare() methods. The integration tests pass and the change has the expected effect of dramatically speeding up the job that originally caused me to look into this issue. The task that was taking hours before completes in under a minute. I'm not sure if I should have changed the implementation of HBaseTypes.bytesToKeyValue() or changed any of the other places that use that method? On the mailing list [~joshwills] said the following: {quote} ...it looks like we were consolidating some common patterns in the code that had different use cases (i.e., defensive copies on reads vs. not doing that on sorts.)... {quote} If defensive copies on reads are desired or required then perhaps the other code shouldn't be touched. The other uses of bytesToKeyValue() are in the PTypes defined by HBaseTypes.cells() and HBaseTypes.keyValues(). Josh (or others) - do you have any more feedback? > HFileUtils.writeToHFilesForIncrementalLoad slowed dramatically by copying > KeyValue byte array > --------------------------------------------------------------------------------------------- > > Key: CRUNCH-614 > URL: https://issues.apache.org/jira/browse/CRUNCH-614 > Project: Crunch > Issue Type: Bug > Affects Versions: 0.11.0, 0.12.0, 0.13.0, 0.14.0 > Reporter: Ben Roling > Attachments: CRUNCH-614-1.patch > > > I raised this issue on the mailing list: > http://mail-archives.apache.org/mod_mbox/crunch-user/201607.mbox/%3CCANBdsh01qaQRCNdQdtqytP%2BWAhT_NVGHyQAdDS8H%2BPPMfi9bkw%40mail.gmail.com%3E > HFileUtils was changed in such a way that it makes a copy of the KeyValue > byte array in the compare() method of the KeyValueComparator. The change was > made with the following commit: > https://github.com/apache/crunch/commit/a959ee6c7fc400d1f455b0742641c54de1dec0bf#diff-bc76ce0b41704c9c4efbfa1aab53588d > The change causes HFileUtils.writeToHFilesForIncrementalLoad to be > dramatically slower in at least some cases. > The code changed from using the KeyValue(byte[], int, int) constructor to > using KeyValue.create(). KeyValue.create() does a byte array copy. The fix > is likely as simple as changing the code back to using the KeyValue > constructor. > I will do some testing an attach a PR for the fix. -- This message was sent by Atlassian JIRA (v6.3.4#6332)