[ 
https://issues.apache.org/jira/browse/SOLR-9347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15397328#comment-15397328
 ] 

Jan Høydahl commented on SOLR-9347:
-----------------------------------

You are giving the tool a directory as argument, so by default it will scan the 
directory and all sub directories for files matching the filetypes pattern.

I assume you want to force the tool into considering all files it finds as 
being of type tsv even if the file has no ending. Problem is that there will 
always be users attempting such a command on a folder with lots of other files, 
causing unexpected behavior. And the tool does not try to guess file types from 
file content, so the only way we can guess is through file endings.

For now I think your best bet is to call the tool once for every file, and use 
some bash scripting to select what files you need.

I guess what could be done is a new option to tell the tool what type it should 
assume for files without a suffix, e.g. {{-nosuffix=tsv}}. The tool would then 
include files without a suffix in the file filter, and map them to that default 
type. Would that cover your use case?

> Solr post tool - ignore file name extension when -type is provided
> ------------------------------------------------------------------
>
>                 Key: SOLR-9347
>                 URL: https://issues.apache.org/jira/browse/SOLR-9347
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>    Affects Versions: 6.1
>            Reporter: nirav patel
>
> I found that post tool is not loading files from directory if files have no 
> extension even if you specify "-params "separator=%09" -type text/tsv 
> -filetypes tsv" in arguments. I think if any of above parameter is used then 
> there is no need to Enter auto mode. 
> Also there is no -verbose or -debug option that indicate potential problem.
> ./bin/post -c mycol1  -params "separator=%09" -type text/tsv -filetypes tsv  
> /dev/datascience/pod1/population/baseline/
> /usr/java/jdk1.8.0_102//bin/java -classpath 
> /home/xactly/solr-6.1.0/dist/solr-core-6.1.0.jar -Dauto=yes -Dc=bonusOrder 
> -Ddata=files -Drecursive=yes org.apache.solr.util.SimplePostTool 
> /mapr/insights/datascience/rulec/prdx/bonusOrderType/baseline/
> SimplePostTool version 5.0.0
> Posting files to [base] url http://localhost:8983/solr/mycol1/update...
> Entering auto mode. File endings considered are 
> xml,json,jsonl,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html,txt,log
> Entering recursive mode, max depth=999, delay=0s
> Indexing directory /dev/datascience/pod1/population/baseline/ (0 files, 
> depth=0)
> 0 files indexed.
> COMMITting Solr index changes to http://localhost:8983/solr/mycol1/update...
> Time spent: 0:00:00.056



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to