Improve Scalability of the XMLLoader for large datasets such as wikipedia
-------------------------------------------------------------------------

                 Key: PIG-1842
                 URL: https://issues.apache.org/jira/browse/PIG-1842
             Project: Pig
          Issue Type: Improvement
            Reporter: Viraj Bhat
            Assignee: Vivek Padmanabhan


The current XMLLoader for Pig, does not work well for large datasets such as 
the wikipedia dataset. Each mapper reads in the entire XML file resulting in 
extermely slow run times.

Viraj

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to