Hi,
i get the following error message when I try to parse a csv file: Can't
retrieve Tika parser for mime-type text/csv...
I use nutch 1.4 and solr 3.6...
The parsechecker gives the same message:
bin/nutch parsechecker http://dsiwikis/documents/forms/open_source_decls.csv
fetching:
Hello,
I want to use nutch for website mirroring, to import starting from a remote
url.
I managed already to create a program that fetches, then merges segments
and reads the content of the segments.
What I want to do next is:
- create a local directory structure which resembles the remote
Hi,
I'm excited to upgrade to Nutch 1.5 but something seems fundamentally
different about the binaries generated in runtime/deploy
With Nutch 1.4 after downloading the source and running ant the
runtime/deploy folder binary and job file was ready to work on hadoop and
worked seamlessly.
With
Hello,
It seems to me that all options to updatedb command that nutch 1.4 has, have
been removed in nutch-2.0. I would like to know if this was done purposefully
or they will be added later? Also, how can I create multiple doc using parse
command? It seem there is no sufficient arguments to
This turns out to be a genuine bug with an easy fix.
build.xml is configured to generate a job file titled apache-nutch-1.5.job
but the deploy binary is still looking for nutch-1.5.job
Renaming apache-nutch-1.5.job to nutch-1.5.job fixes this bug in deploy
mode.
--
View this message in
5 matches
Mail list logo