Can't retrieve Tika parser for mime-type text/csv

2012-06-18 Thread Olivier LEVILLAIN
Hi, i get the following error message when I try to parse a csv file: Can't retrieve Tika parser for mime-type text/csv... I use nutch 1.4 and solr 3.6... The parsechecker gives the same message: bin/nutch parsechecker http://dsiwikis/documents/forms/open_source_decls.csv fetching:

Nutch as mirroring tool

2012-06-18 Thread Vlad Paunescu
Hello, I want to use nutch for website mirroring, to import starting from a remote url. I managed already to create a program that fetches, then merges segments and reads the content of the segments. What I want to do next is: - create a local directory structure which resembles the remote

Nutch 1.5 Deploy Mode Doesn't Work like Nutch 1.4 Deploy Mode

2012-06-18 Thread sidbatra
Hi, I'm excited to upgrade to Nutch 1.5 but something seems fundamentally different about the binaries generated in runtime/deploy With Nutch 1.4 after downloading the source and running ant the runtime/deploy folder binary and job file was ready to work on hadoop and worked seamlessly. With

nutch-2.0 updatedb and parse commands

2012-06-18 Thread alxsss
Hello, It seems to me that all options to updatedb command that nutch 1.4 has, have been removed in nutch-2.0. I would like to know if this was done purposefully or they will be added later? Also, how can I create multiple doc using parse command? It seem there is no sufficient arguments to

Re: Nutch 1.5 Deploy Mode Doesn't Work like Nutch 1.4 Deploy Mode

2012-06-18 Thread sidbatra
This turns out to be a genuine bug with an easy fix. build.xml is configured to generate a job file titled apache-nutch-1.5.job but the deploy binary is still looking for nutch-1.5.job Renaming apache-nutch-1.5.job to nutch-1.5.job fixes this bug in deploy mode. -- View this message in