[ https://issues.apache.org/jira/browse/TIKA-1674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14616777#comment-14616777 ]
Tim Allison commented on TIKA-1674: ----------------------------------- Help! With r1689690, I added an example of how to pull out embedded documents. However, I can't figure out how to get the EmbeddedDocumentExtractor to work recursively beyond the children of the initial document. I confirmed that tika-server has similar behavior: {noformat} @Test public void testFullRecursion() throws Exception { Response response = WebClient.create(endPoint + UNPACKER_PATH) .accept("application/zip") .put(ClassLoader.getSystemResourceAsStream("test_recursive_embedded.docx")); Map<String, String> data = readZipArchive((InputStream) response.getEntity()); assertEquals(2, data.size()); } {noformat} Is this expected behavior, or am I not setting up the parser/embedded document extractor properly? Do I need to reset the stream at some point? Thank you. > Add example to show how to extract embedded files > ------------------------------------------------- > > Key: TIKA-1674 > URL: https://issues.apache.org/jira/browse/TIKA-1674 > Project: Tika > Issue Type: New Feature > Reporter: Tim Allison > Priority: Minor > Fix For: 1.10 > > > On tika-user, we received a question on how to extract embedded files. Let's > add an example. -- This message was sent by Atlassian JIRA (v6.3.4#6332)