[ https://issues.apache.org/jira/browse/TIKA-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14281086#comment-14281086 ]
Luis Filipe Nassif edited comment on TIKA-1511 at 1/19/15 12:01 PM: -------------------------------------------------------------------- If the inputStream (pseudoInputStream) received by EmbeddedDocExtractor can not be read, I think using EDE is not useful. How will this approach work with TikaCli --extract option? My original idea was to support an use case to extract each table to one file... Now I think this extraction of tables to files can be done handling the db as one big doc and using a ContentHandlerDecorator that will split the xhtml output at table boundaries. Each xhtml segment can be converted to a byte[] (if small) and then to a ByteArrayInputStream that can be handled by an EmbeddedDocExtractor, if setted into parseContext. If not setted, the ContentHandlerDecorator do not need to split the xhtml output and can fallback to default behavior. Then A custom EDE can extract tables to files if desired. So now I think the big doc approah is not bad. What do you think? was (Author: lfcnassif): If the inputStream (pseudoInputStream) received by EmbeddedDocExtractor can not be read, I think using EDE is not useful. How will this approach work with TikaCli --extract option? My original idea was to support an use case like TikaCli --extract... Now I think this extraction of tables to files can be done handling the db as one big doc and using a ContentHandlerDecorator that will split the xhtml output at table boundaries. Each xhtml segment can be converted to a byte[] (if small) and then to a ByteArrayInputStream that can be handled by an EmbeddedDocDecorator, if setted into parseContext. If not setted the ContentHandlerDecorator do not need to split tables and can fallback to default behavior. A custom EDE can then extract tables to files if desired. So now I think we could go with the big doc approah. What do you think? > Create a parser for SQLite3 > --------------------------- > > Key: TIKA-1511 > URL: https://issues.apache.org/jira/browse/TIKA-1511 > Project: Tika > Issue Type: New Feature > Components: parser > Affects Versions: 1.6 > Reporter: Luis Filipe Nassif > Fix For: 1.8 > > Attachments: TIKA-1511v1.patch, TIKA-1511v2.patch, testSQLLite3b.db > > > I think it would be very useful, as sqlite is used as data storage by a wide > range of applications. Opening the ticket to track it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)