Hi I am going to feed nutch-0.8-dev crawler with seeds in xml format. And I have read nutch TextInputFormat/InputFormatBase. It seems now nutch breaks the plain text files into chars and parses on them. My question is how to support XmlInputFormat, in my eye, xml format is not character-based but blocke-based.
Thanks /Jack -- Keep Discovering ... ... http://www.jroller.com/page/jmars
