Hello, I have a larger XML file, over 10GB, has a simple format like
<book> <title></title> <author></author> ... </book> I used to parse the XML and convert into another format, i.e. CSV. Currently, the parsing only performed on a single server and speed is slow (a few hours) Is hadoop is a good solution for spliting the XML files and spread the XML parsing on serveral clusters? Thanks for any comment.