JIRA: https://issues.apache.org/jira/projects/TIKA/issues

Sorry, example file_s_  :D

Thank you!

On Wed, Aug 4, 2021 at 7:32 AM Tim Allison <[email protected]> wrote:

> Can you open an issue on our JIRA and attach an example file?
>
> Thank you!
>
> On Wed, Aug 4, 2021 at 7:31 AM sanliang_fighting <
> [email protected]> wrote:
>
>> Hello:
>>
>> Thank you very much for providing such a powerful and practical product.
>> During the use, we found some problems in the current version. I hope your
>> team can help us solve this problem.
>>
>> 1: If our use method is wrong, please help us use the correct way
>>
>> File file = new File("XX");
>> Parser parser = new OfficeParser();
>>  ParseContext context = new ParseContext();
>>  Metadata metadata = new Metadata();
>>
>> metadata.set(HttpHeaders.CONTENT_ENCODING, "GB18030");
>> metadata.set(TikaMetadataKeys.RESOURCE_NAME_KEY, file.getName());
>> parser.parse(inputStream, handler, metadata, context);
>>
>> 2: If there is indeed this omission in the current version, please help
>> us optimize it in subsequent versions
>>
>> 3: We use Tika version: 1.20. Of course, we have replaced the latest
>> version 2.0. This problem still exists.
>>
>> In general, the problem is that objects of 2013 and above are inserted
>> into office series documents in 2007, and the inserted objects will be
>> automatically ignored when Tika extracts content. As a result, the contents
>> of the object cannot be extracted. We sorted out the detailed table of
>> whether the content can be extracted normally as follows:
>>
>> 文件
>>
>>
>>
>> 附件(插入对象)
>>
>> doc
>>
>> docx
>>
>> xls
>>
>> xlsx
>>
>> ppt
>>
>> pptx
>>
>> txt
>>
>> Y
>>
>> Y
>>
>> Y
>>
>> Y
>>
>> N
>>
>> Y
>>
>> pdf
>>
>> Y
>>
>> Y
>>
>> Y
>>
>> Y
>>
>> N
>>
>> Y
>>
>> xml
>>
>> Y
>>
>> Y
>>
>> Y
>>
>> Y
>>
>> N
>>
>> Y
>>
>> doc
>>
>> Y
>>
>> Y
>>
>> Y
>>
>> Y
>>
>> N
>>
>> Y
>>
>> docx
>>
>> N
>>
>> Y
>>
>> Y
>>
>> Y
>>
>> N
>>
>> Y
>>
>> xls
>>
>> Y
>>
>> Y
>>
>> Y
>>
>> Y
>>
>> N
>>
>> Y
>>
>> xlsx
>>
>> Y
>>
>> Y
>>
>> N
>>
>> N
>>
>> N
>>
>> N
>>
>> ppt
>>
>> Y
>>
>> Y
>>
>> Y
>>
>> Y
>>
>> N
>>
>> Y
>>
>> pptx
>>
>> Y
>>
>> Y
>>
>> Y
>>
>> Y
>>
>> N
>>
>> Y
>> *We look forward to receiving your reply. Thank you.*
>>
>> 2021-08-04
>> ------------------------------
>> sanliang_fighting
>>
>

Reply via email to