Re: Unable to parse flv and epub file contents using nutch

Pankaj Kumar Mon, 13 May 2013 14:17:50 -0700

I think, you are doing good till now.
Nutch usually crawls the data and fetches the URLs of all the files, like
html, pdf etc in the specified directory in binary format.
Now, in order to get or save the files in their actual format, in your
case, .flv or .epub files, you will have to write additional program (for
example in Java).


Hope this helps.

With Regards,
Pankaj Kumar



On Mon, May 13, 2013 at 6:35 AM, vicky4751 <[email protected]>wrote:

> Hi,
>
> i am working with apache nutch and solr, my requirement is to parse the
> contents of flv and epub files, i am using below command to parse the files
>
> bin/nutch crawl urls -solr http://localhost:8983/solr/
>
> i have kept the file urls in urls folder of nutch. the above command is
> working but when i tried to view the parsed content using solr with the
> following command its is just displaying the url of the files instead of
> its
> contents.
>
> bin/nutch solrindex http://127.0.0.1:8983/solr/ crawl/crawldb -linkdb
> crawl/linkdb crawl/segments/*
>
> please suggest me....
>
> Thanks
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Unable-to-parse-flv-and-epub-file-contents-using-nutch-tp4062927.html
> Sent from the Nutch - Dev mailing list archive at Nabble.com.
>

Re: Unable to parse flv and epub file contents using nutch

Reply via email to