Improve Scalability of the XMLLoader for large datasets such as wikipedia
-------------------------------------------------------------------------
Key: PIG-1842
URL: https://issues.apache.org/jira/browse/PIG-1842
Project: Pig
Issue Type: Improvement
Reporter: Viraj Bhat
Assignee: Vivek Padmanabhan
The current XMLLoader for Pig, does not work well for large datasets such as
the wikipedia dataset. Each mapper reads in the entire XML file resulting in
extermely slow run times.
Viraj
--
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira