[
https://issues.apache.org/jira/browse/TIKA-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14281086#comment-14281086
]
Luis Filipe Nassif edited comment on TIKA-1511 at 1/18/15 2:09 PM:
-------------------------------------------------------------------
If the inputStream (pseudoInputStream) received by EmbeddedDocExtractor can not
be read, I think using EDE is not useful. How will this approach work with
TikaCli --extract option? My original idea was to support an use case like
TikaCli --extract...
Now I think this extraction of tables to files can be done handling the db as
one big doc and using a ContentHandlerDecorator that will split the xhtml
output at table boundaries. Each xhtml segment can be converted to a byte[] (if
small) and then to a ByteArrayInputStream that can be handled by an
EmbeddedDocDecorator, if setted into parseContext. If not setted the
ContentHandlerDecorator do not need to split tables and can fallback to default
behavior. A custom EDE can then extract tables to files if desired.
So now I think we could go with the big doc approah. What do you think?
was (Author: lfcnassif):
If the inputStream (pseudoInputStream) received by EmbeddedDocExtractor can not
be read, I think using EDE is not useful. How will this approach work with
TikaCli --extract option? My original idea was to support an use case like
TikaCli --extract...
Now I think this extraction of tables to files can be done handling the db as
one big doc and using a ContentHandlerDecorator that will split the xhtml
output at table bondaries. Each xhtml segment can be converted to a byte[] (if
small) and then to a ByteArrayInputStream that can be passed to a
EmbeddedDocDecorator, if set on parseContext. If not set the
ContentHandlerDecorator do not need to split tables and can fallBack to default
behavior. A custom EDE can then extract tables to files if desired.
So now I think we could go with the big doc approah. What do you think?
> Create a parser for SQLite3
> ---------------------------
>
> Key: TIKA-1511
> URL: https://issues.apache.org/jira/browse/TIKA-1511
> Project: Tika
> Issue Type: New Feature
> Components: parser
> Affects Versions: 1.6
> Reporter: Luis Filipe Nassif
> Fix For: 1.8
>
> Attachments: TIKA-1511v1.patch, TIKA-1511v2.patch, testSQLLite3b.db
>
>
> I think it would be very useful, as sqlite is used as data storage by a wide
> range of applications. Opening the ticket to track it.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)