Timothy Potter created SOLR-7057:
------------------------------------
Summary: SimplePostTool curbside appeal
Key: SOLR-7057
URL: https://issues.apache.org/jira/browse/SOLR-7057
Project: Solr
Issue Type: Improvement
Components: SimplePostTool
Reporter: Timothy Potter
Priority: Minor
When trying to index some Freebase articles, such as:
http://maven.tamingtext.com/freebase-wex-2011-01-18-articles-first10k.tsv
using the SimplePostTool (bin/post), I ran into a few minor things along the
way that would help new users trying to get their content indexed.
First, I tried the naive approach:
{code}
$ bin/post -c freebase ./freebase-wex-2011-01-18-articles-first10k.tsv
{code}
Didn't work ... here's the output:
{code}
SimplePostTool: WARNING: Skipping
freebase-wex-2011-01-18-articles-first10k.tsv. Unsupported file type for auto
mode.
1 files indexed.
{code}
Ummm ... no, 1 files not indexed ;-) Instead the output should be something
like:
{code}
SimplePostTool: WARNING: Skipping
freebase-wex-2011-01-18-articles-first10k.tsv. Unsupported file type for auto
mode.
0 of 1 files indexed.
{code}
Besides the misleading output, shouldn't tsv be a supported file type for
auto-mode? It's a common enough format ...
So I renamed the file to .csv instead and re-ran ... this time I get:
{code}
$ mv freebase-wex-2011-01-18-articles-first10k.tsv
freebase-wex-2011-01-18-articles-first10k.csv
$ bin/post -c freebase ./freebase-wex-2011-01-18-articles-first10k.csv
ERROR - 2015-01-28 16:24:16.074; org.apache.solr.common.SolrException;
org.apache.solr.common.SolrException: CSVLoader: input=null, line=1,expected
108 values but got 4
{code}
Hmmm ... OK ... did a little Googling and discovered I needed to specify the
separator to be %09 (again, the tool should just recognize TSV as a supported
format)
{code}
bin/post -c freebase -params "separator=%09&escape=\\"
./freebase-wex-2011-01-18-articles-first10k.csv
{code}
Success! (of course I had to add a header line to the file too, but there's
little we can do about that)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]