Hi all,

I'm working on a small project for university and I have some question about how to implement it. Maybe you could give me some hints....

I have a directory that contains around 1 million HTML files. Basically, I just want to read each file entirely into a String and parse it with JSoup in a Mapper. Do we have a InputFormat that can be used for this use case or do I have to implement my own FileInputFormat for that? :/ In general: Do you think creating InputSplits of the directory will work properly with 1 million FileStatus'es?


Regards,
Timo

Reply via email to