Hi Could you post more details from the logs? Maybe you can this command to check the parser first. [0]
bin/nutch plugin Parser org.apache.nutch.parse.ParserChecker www.epingsoft.com/epub/examples/AChristmasCarol.epub [0] http://wiki.apache.org/nutch/bin/nutch%20plugin On Tue, May 14, 2013 at 1:14 PM, mahodaya <[email protected]> wrote: > Hi > > my requirement is to extract the contents of epub files using apache nutch > and solr. In my nutch-site.xml file i have included "epub" format in > pugin.includes property and in regex-urlfilter.txt accepted everything with > this syntax ".+" and i have included parse- tika plugin in > parse-plugins.xml. > > I am giving this url www.epingsoft.com/epub/examples/AChristmasCarol.epubin > seed.txt of url directory. > > I am using following commands to get the contents > > bin/nutch crawl urls -solr http://localhost:8983/solr/ > > bin/nutch solrindex http://127.0.0.1:8983/solr/ crawl/crawldb -linkdb > crawl/linkdb crawl/segments/* > > but when i try to view the result using solr it display only url of the > file > as follows > > www.epingsoft.com/epub/examples/AChristmasCarol.epub/AChristmasCarol > AChristmasCarol AChristmasCarol > www.epingsoft.com/epub/examples/AChristmasCarol.epub AChristmasCarol > www.epingsoft.com/epub/examples/AChristmasCarol.epub > > > please help me how can i get the actual contents of the epub file > > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/how-to-parse-epub-files-using-plugin-parse-tika-tp4063137.html > Sent from the Nutch - Dev mailing list archive at Nabble.com. > -- Don't Grow Old, Grow Up... :-)

