[jira] Commented: (HADOOP-5598) Implement a pure Java CRC32 calculator

Owen O'Malley (JIRA) Fri, 19 Jun 2009 08:06:11 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-5598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12721803#action_12721803
 ]


Owen O'Malley commented on HADOOP-5598:
---------------------------------------

Our problem with JNI mostly happens when you have large byte[] that you are 
using for your input. However, it depends a lot on the fragmentation of the 
heap and thus is not easy to benchmark against. It was in the context of doing 
the terabyte sort. The problem with JNI is that to get access to a byte[], the 
runtime may need to copy the array in/out of the C code. If the array is 100 
mb, that takes a lot of time.

> Implement a pure Java CRC32 calculator
> --------------------------------------
>
>                 Key: HADOOP-5598
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5598
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: Owen O'Malley
>            Assignee: Todd Lipcon
>         Attachments: crc32-results.txt, hadoop-5598-evil.txt, 
> hadoop-5598-hybrid.txt, hadoop-5598.txt, hadoop-5598.txt, PureJavaCrc32.java, 
> PureJavaCrc32.java, PureJavaCrc32.java, TestCrc32Performance.java, 
> TestCrc32Performance.java, TestCrc32Performance.java, TestPureJavaCrc32.java
>
>
> We've seen a reducer writing 200MB to HDFS with replication = 1 spending a 
> long time in crc calculation. In particular, it was spending 5 seconds in crc 
> calculation out of a total of 6 for the write. I suspect that it is the 
> java-jni border that is causing us grief.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-5598) Implement a pure Java CRC32 calculator

Reply via email to