> From: Linh Tang > Sent: November 3, 2014 2:30:46pm PST > To: [email protected] > Subject: Parse Html with Tika > > Dear All, > > I am Phuong Linh, > I am using Tika to extract content form Html file to search. But HtmlParser > cannot parse all tag of Html.
I'm not sure what you mean by "cannot parse all tag of Html". Do you have an example of an HTML page, and text that isn't being extracted? -- Ken > ( I get Html page by Nutch, then use Tika to > extract the important information, after then use Solr to search.) > Can you tell me what i can do to parse all tag of html. > > Thanks advance! > > Regards, > Tang Thi Phuong Linh. > -- > P.Linh -------------------------- Ken Krugler +1 530-210-6378 http://www.scaleunlimited.com custom big data solutions & training Hadoop, Cascading, Cassandra & Solr -------------------------- Ken Krugler +1 530-210-6378 http://www.scaleunlimited.com custom big data solutions & training Hadoop, Cascading, Cassandra & Solr
