[ 
https://issues.apache.org/jira/browse/CONNECTORS-613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13556145#comment-13556145
 ] 

Karl Wright commented on CONNECTORS-613:
----------------------------------------

r1434653 adds the infrastructure, and hooks it up into the Solr connector, Web 
connector, and RSS connector.

For the file connector and JCIFS connector, determining the mime type will need 
to be done by mapping the extension to a mime type.  An obvious mapping:

.txt -> text/plain

But I need more mappings, e.g.:

.doc -> ???
.docx -> ???
.ppt -> ???
.pptx -> ???
.xls -> ???
.xlsx -> ???

etc.

Abe-san, if you can put together a table, I can add this to the necessary 
connectors quickly.


                
> The content of sjis file can't be extracted
> -------------------------------------------
>
>                 Key: CONNECTORS-613
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-613
>             Project: ManifoldCF
>          Issue Type: Bug
>          Components: File system connector, Lucene/SOLR connector
>    Affects Versions: ManifoldCF 1.0.1, ManifoldCF 1.1
>         Environment: Solr 4.x (not Solr 3.x)
>            Reporter: Shinichiro Abe
>            Assignee: Karl Wright
>             Fix For: ManifoldCF 1.1
>
>         Attachments: files.zip
>
>
> When posting sjis text file by using curl, the content can be extracted.
> {noformat}
> curl "http://localhost:8983/solr/update/extract?literal.id=1&commit=true"; -F 
> "[email protected]"
> {noformat} 
> But when posting this file by File system connector, it can't be extracted. 
> it results empty.
> It seems that the content of utf-8 text file can be extracted.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to