[ 
https://issues.apache.org/jira/browse/HADOOP-5598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12721804#action_12721804
 ] 

Owen O'Malley commented on HADOOP-5598:
---------------------------------------

I should have commented earlier on this. I think the right solution is to use a 
pure Java impl if we can get the performance comparable in the "normal" case. 
If use a C implementation in libhadoop, it should use DirectByteBuffers and 
pool those buffers. Furthermore, it should be a different jira, since there are 
a lot more issues there.

I'd also veto any code that dynamically switches implementations based on 
anything other that whether libhadoop is present. (ie. switching based on the 
size of the input is going to be unmaintainable)

I can upload the code that I wrote for the pure java, if you want to see a 
third implementation.

> Implement a pure Java CRC32 calculator
> --------------------------------------
>
>                 Key: HADOOP-5598
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5598
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: Owen O'Malley
>            Assignee: Todd Lipcon
>         Attachments: crc32-results.txt, hadoop-5598-evil.txt, 
> hadoop-5598-hybrid.txt, hadoop-5598.txt, hadoop-5598.txt, PureJavaCrc32.java, 
> PureJavaCrc32.java, PureJavaCrc32.java, TestCrc32Performance.java, 
> TestCrc32Performance.java, TestCrc32Performance.java, TestPureJavaCrc32.java
>
>
> We've seen a reducer writing 200MB to HDFS with replication = 1 spending a 
> long time in crc calculation. In particular, it was spending 5 seconds in crc 
> calculation out of a total of 6 for the write. I suspect that it is the 
> java-jni border that is causing us grief.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to