[ https://issues.apache.org/jira/browse/TIKA-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14281086#comment-14281086 ]
Luis Filipe Nassif commented on TIKA-1511: ------------------------------------------ If the inputStream (pseudoInputStream) received by EmbeddedDocExtractor can not be read, I think using EDE is not useful. How will this approach work with TikaCli --extract option? My original idea was to support an use case like TikaCli --extract... Now I think this extraction of tables to files can be done handling the db as one big doc and using a ContentHandlerDecorator that will split the xhtml output at table bondaries. Each xhtml segment can be converted to a byte[] (if small) and then to a ByteArrayInputStream that can be passed to a EmbeddedDocDecorator, if set on parseContext. If not set the ContentHandlerDecorator do not need to split tables and can fallBack to default behavior. A custom EDE can then extract tables to files if desired. So now I think we could go with the big doc approah. What do you think? > Create a parser for SQLite3 > --------------------------- > > Key: TIKA-1511 > URL: https://issues.apache.org/jira/browse/TIKA-1511 > Project: Tika > Issue Type: New Feature > Components: parser > Affects Versions: 1.6 > Reporter: Luis Filipe Nassif > Fix For: 1.8 > > Attachments: TIKA-1511v1.patch, TIKA-1511v2.patch, testSQLLite3b.db > > > I think it would be very useful, as sqlite is used as data storage by a wide > range of applications. Opening the ticket to track it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)