I think, you are doing good till now. Nutch usually crawls the data and fetches the URLs of all the files, like html, pdf etc in the specified directory in binary format. Now, in order to get or save the files in their actual format, in your case, .flv or .epub files, you will have to write additional program (for example in Java).
Hope this helps. With Regards, Pankaj Kumar On Mon, May 13, 2013 at 6:35 AM, vicky4751 <[email protected]>wrote: > Hi, > > i am working with apache nutch and solr, my requirement is to parse the > contents of flv and epub files, i am using below command to parse the files > > bin/nutch crawl urls -solr http://localhost:8983/solr/ > > i have kept the file urls in urls folder of nutch. the above command is > working but when i tried to view the parsed content using solr with the > following command its is just displaying the url of the files instead of > its > contents. > > bin/nutch solrindex http://127.0.0.1:8983/solr/ crawl/crawldb -linkdb > crawl/linkdb crawl/segments/* > > please suggest me.... > > Thanks > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Unable-to-parse-flv-and-epub-file-contents-using-nutch-tp4062927.html > Sent from the Nutch - Dev mailing list archive at Nabble.com. >

