[ https://issues.apache.org/jira/browse/HADOOP-5598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12721803#action_12721803 ]
Owen O'Malley commented on HADOOP-5598: --------------------------------------- Our problem with JNI mostly happens when you have large byte[] that you are using for your input. However, it depends a lot on the fragmentation of the heap and thus is not easy to benchmark against. It was in the context of doing the terabyte sort. The problem with JNI is that to get access to a byte[], the runtime may need to copy the array in/out of the C code. If the array is 100 mb, that takes a lot of time. > Implement a pure Java CRC32 calculator > -------------------------------------- > > Key: HADOOP-5598 > URL: https://issues.apache.org/jira/browse/HADOOP-5598 > Project: Hadoop Core > Issue Type: Improvement > Components: dfs > Reporter: Owen O'Malley > Assignee: Todd Lipcon > Attachments: crc32-results.txt, hadoop-5598-evil.txt, > hadoop-5598-hybrid.txt, hadoop-5598.txt, hadoop-5598.txt, PureJavaCrc32.java, > PureJavaCrc32.java, PureJavaCrc32.java, TestCrc32Performance.java, > TestCrc32Performance.java, TestCrc32Performance.java, TestPureJavaCrc32.java > > > We've seen a reducer writing 200MB to HDFS with replication = 1 spending a > long time in crc calculation. In particular, it was spending 5 seconds in crc > calculation out of a total of 6 for the write. I suspect that it is the > java-jni border that is causing us grief. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.