[
https://issues.apache.org/jira/browse/CONNECTORS-613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13556206#comment-13556206
]
Shinichiro Abe commented on CONNECTORS-613:
-------------------------------------------
.pdf -> application/pdf
.doc -> application/msword
.docx -> application/vnd.openxmlformats-officedocument.wordprocessingml.document
.ppt -> application/vnd.ms-powerpoint
.pptx ->
application/vnd.openxmlformats-officedocument.presentationml.presentation
.xls -> application/vnd.ms-excel
.xlsx -> application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
I refered to the tika-mimetypes.xml
http://svn.apache.org/viewvc/tika/trunk/tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml?view=markup
Thanks.
> The content of sjis file can't be extracted
> -------------------------------------------
>
> Key: CONNECTORS-613
> URL: https://issues.apache.org/jira/browse/CONNECTORS-613
> Project: ManifoldCF
> Issue Type: Bug
> Components: File system connector, Lucene/SOLR connector
> Affects Versions: ManifoldCF 1.0.1, ManifoldCF 1.1
> Environment: Solr 4.x (not Solr 3.x)
> Reporter: Shinichiro Abe
> Assignee: Karl Wright
> Fix For: ManifoldCF 1.1
>
> Attachments: files.zip
>
>
> When posting sjis text file by using curl, the content can be extracted.
> {noformat}
> curl "http://localhost:8983/solr/update/extract?literal.id=1&commit=true" -F
> "[email protected]"
> {noformat}
> But when posting this file by File system connector, it can't be extracted.
> it results empty.
> It seems that the content of utf-8 text file can be extracted.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira