[ 
https://issues.apache.org/jira/browse/CONNECTORS-613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13556206#comment-13556206
 ] 

Shinichiro Abe commented on CONNECTORS-613:
-------------------------------------------

.pdf -> application/pdf
.doc -> application/msword
.docx -> application/vnd.openxmlformats-officedocument.wordprocessingml.document
.ppt -> application/vnd.ms-powerpoint
.pptx -> 
application/vnd.openxmlformats-officedocument.presentationml.presentation
.xls -> application/vnd.ms-excel
.xlsx -> application/vnd.openxmlformats-officedocument.spreadsheetml.sheet

I refered to the tika-mimetypes.xml
http://svn.apache.org/viewvc/tika/trunk/tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml?view=markup

Thanks.
                
> The content of sjis file can't be extracted
> -------------------------------------------
>
>                 Key: CONNECTORS-613
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-613
>             Project: ManifoldCF
>          Issue Type: Bug
>          Components: File system connector, Lucene/SOLR connector
>    Affects Versions: ManifoldCF 1.0.1, ManifoldCF 1.1
>         Environment: Solr 4.x (not Solr 3.x)
>            Reporter: Shinichiro Abe
>            Assignee: Karl Wright
>             Fix For: ManifoldCF 1.1
>
>         Attachments: files.zip
>
>
> When posting sjis text file by using curl, the content can be extracted.
> {noformat}
> curl "http://localhost:8983/solr/update/extract?literal.id=1&commit=true"; -F 
> "[email protected]"
> {noformat} 
> But when posting this file by File system connector, it can't be extracted. 
> it results empty.
> It seems that the content of utf-8 text file can be extracted.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to