Hi Jun, Which version of Nutch are you using and which parser? parse-html or parse-tika?
julien On 8 February 2011 08:16, Jun Yang <[email protected]> wrote: > Hi there, > > i am working on a plugin to fetch some structured information (e.g., > product price) in web pages, and I had some problem parsing the following > simple node: > > <span class="product-price-amount"> > > $27.00</span> > > The parser first got the Node for "span", which has only one child node as > a text Node. I would assume this text Node has value "$27.00", but when I > called getNodeValue() the return value is empty. I forced this child node to > be Text node and called getWholeText() but still get empty return value. > > Does anyone know what's going on? It seems that the text "$27.00" seems to > be missing from the whole hierarchy. > > Jun > > > -- * *Open Source Solutions for Text Engineering http://digitalpebble.blogspot.com/ http://www.digitalpebble.com

