Make the Reader for sampling TeraSort input multithreaded
---------------------------------------------------------
Key: HADOOP-4946
URL: https://issues.apache.org/jira/browse/HADOOP-4946
Project: Hadoop Core
Issue Type: Improvement
Components: examples
Affects Versions: 0.19.0
Reporter: Devaraj Das
Assignee: Devaraj Das
Fix For: 0.21.0
The TeraSort sampler that reads from multiple splits to come up with the
partition information can be made multi-threaded, where multiple threads would
read from multiple splits concurrently. That should lead to better performance
and also we could attempt at sampling more records to arrive at a better
partition info.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.