[ 
https://issues.apache.org/jira/browse/TIKA-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14281086#comment-14281086
 ] 

Luis Filipe Nassif edited comment on TIKA-1511 at 1/19/15 12:01 PM:
--------------------------------------------------------------------

If the inputStream (pseudoInputStream) received by EmbeddedDocExtractor can not 
be read, I think using EDE is not useful. How will this approach work with 
TikaCli --extract option? My original idea was to support an use case to 
extract each table to one file...

Now I think this extraction of tables to files can be done handling the db as 
one big doc and using a ContentHandlerDecorator that will split the xhtml 
output at table boundaries. Each xhtml segment can be converted to a byte[] (if 
small) and then to a ByteArrayInputStream that can be handled by an 
EmbeddedDocExtractor, if setted into parseContext. If not setted, the 
ContentHandlerDecorator do not need to split the xhtml output and can fallback 
to default behavior. Then A custom EDE can extract tables to files if desired.

So now I think the big doc approah is not bad. What do you think?


was (Author: lfcnassif):
If the inputStream (pseudoInputStream) received by EmbeddedDocExtractor can not 
be read, I think using EDE is not useful. How will this approach work with 
TikaCli --extract option? My original idea was to support an use case like 
TikaCli --extract...

Now I think this extraction of tables to files can be done handling the db as 
one big doc and using a ContentHandlerDecorator that will split the xhtml 
output at table boundaries. Each xhtml segment can be converted to a byte[] (if 
small) and then to a ByteArrayInputStream that can be handled by an 
EmbeddedDocDecorator, if setted into parseContext. If not setted the 
ContentHandlerDecorator do not need to split tables and can fallback to default 
behavior. A custom EDE can then extract tables to files if desired.

So now I think we could go with the big doc approah. What do you think?

> Create a parser for SQLite3
> ---------------------------
>
>                 Key: TIKA-1511
>                 URL: https://issues.apache.org/jira/browse/TIKA-1511
>             Project: Tika
>          Issue Type: New Feature
>          Components: parser
>    Affects Versions: 1.6
>            Reporter: Luis Filipe Nassif
>             Fix For: 1.8
>
>         Attachments: TIKA-1511v1.patch, TIKA-1511v2.patch, testSQLLite3b.db
>
>
> I think it would be very useful, as sqlite is used as data storage by a wide 
> range of applications. Opening the ticket to track it. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to