Timothy Potter created SOLR-7057:
------------------------------------

             Summary: SimplePostTool curbside appeal
                 Key: SOLR-7057
                 URL: https://issues.apache.org/jira/browse/SOLR-7057
             Project: Solr
          Issue Type: Improvement
          Components: SimplePostTool
            Reporter: Timothy Potter
            Priority: Minor


When trying to index some Freebase articles, such as:

http://maven.tamingtext.com/freebase-wex-2011-01-18-articles-first10k.tsv

using the SimplePostTool (bin/post), I ran into a few minor things along the 
way that would help new users trying to get their content indexed.

First, I tried the naive approach:
{code}
$ bin/post -c freebase ./freebase-wex-2011-01-18-articles-first10k.tsv 
{code}

Didn't work ... here's the output:

{code}
SimplePostTool: WARNING: Skipping 
freebase-wex-2011-01-18-articles-first10k.tsv. Unsupported file type for auto 
mode.
1 files indexed.
{code}

Ummm ... no, 1 files not indexed ;-) Instead the output should be something 
like:

{code}
SimplePostTool: WARNING: Skipping 
freebase-wex-2011-01-18-articles-first10k.tsv. Unsupported file type for auto 
mode.
0 of 1 files indexed.
{code}

Besides the misleading output, shouldn't tsv be a supported file type for 
auto-mode? It's a common enough format ...

So I renamed the file to .csv instead and re-ran ... this time I get:

{code}
$ mv freebase-wex-2011-01-18-articles-first10k.tsv 
freebase-wex-2011-01-18-articles-first10k.csv
$ bin/post -c freebase ./freebase-wex-2011-01-18-articles-first10k.csv

ERROR - 2015-01-28 16:24:16.074; org.apache.solr.common.SolrException; 
org.apache.solr.common.SolrException: CSVLoader: input=null, line=1,expected 
108 values but got 4
{code}

Hmmm ... OK ... did a little Googling and discovered I needed to specify the 
separator to be %09 (again, the tool should just recognize TSV as a supported 
format)

{code}
bin/post -c freebase -params "separator=%09&escape=\\" 
./freebase-wex-2011-01-18-articles-first10k.csv
{code}

Success! (of course I had to add a header line to the file too, but there's 
little we can do about that)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to