JIRA: https://issues.apache.org/jira/projects/TIKA/issues
Sorry, example file_s_ :D Thank you! On Wed, Aug 4, 2021 at 7:32 AM Tim Allison <[email protected]> wrote: > Can you open an issue on our JIRA and attach an example file? > > Thank you! > > On Wed, Aug 4, 2021 at 7:31 AM sanliang_fighting < > [email protected]> wrote: > >> Hello: >> >> Thank you very much for providing such a powerful and practical product. >> During the use, we found some problems in the current version. I hope your >> team can help us solve this problem. >> >> 1: If our use method is wrong, please help us use the correct way >> >> File file = new File("XX"); >> Parser parser = new OfficeParser(); >> ParseContext context = new ParseContext(); >> Metadata metadata = new Metadata(); >> >> metadata.set(HttpHeaders.CONTENT_ENCODING, "GB18030"); >> metadata.set(TikaMetadataKeys.RESOURCE_NAME_KEY, file.getName()); >> parser.parse(inputStream, handler, metadata, context); >> >> 2: If there is indeed this omission in the current version, please help >> us optimize it in subsequent versions >> >> 3: We use Tika version: 1.20. Of course, we have replaced the latest >> version 2.0. This problem still exists. >> >> In general, the problem is that objects of 2013 and above are inserted >> into office series documents in 2007, and the inserted objects will be >> automatically ignored when Tika extracts content. As a result, the contents >> of the object cannot be extracted. We sorted out the detailed table of >> whether the content can be extracted normally as follows: >> >> 文件 >> >> >> >> 附件(插入对象) >> >> doc >> >> docx >> >> xls >> >> xlsx >> >> ppt >> >> pptx >> >> txt >> >> Y >> >> Y >> >> Y >> >> Y >> >> N >> >> Y >> >> pdf >> >> Y >> >> Y >> >> Y >> >> Y >> >> N >> >> Y >> >> xml >> >> Y >> >> Y >> >> Y >> >> Y >> >> N >> >> Y >> >> doc >> >> Y >> >> Y >> >> Y >> >> Y >> >> N >> >> Y >> >> docx >> >> N >> >> Y >> >> Y >> >> Y >> >> N >> >> Y >> >> xls >> >> Y >> >> Y >> >> Y >> >> Y >> >> N >> >> Y >> >> xlsx >> >> Y >> >> Y >> >> N >> >> N >> >> N >> >> N >> >> ppt >> >> Y >> >> Y >> >> Y >> >> Y >> >> N >> >> Y >> >> pptx >> >> Y >> >> Y >> >> Y >> >> Y >> >> N >> >> Y >> *We look forward to receiving your reply. Thank you.* >> >> 2021-08-04 >> ------------------------------ >> sanliang_fighting >> >
