[
https://issues.apache.org/jira/browse/TIKA-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14285568#comment-14285568
]
Tim Allison commented on TIKA-1511:
-----------------------------------
{quote}
A) I think it will work, as the patch works now. But I think an inputStream
that can not be read is a bit strange.
{quote}
Agreed. The new proposal is to make the InputStream readable, but the regular
use case of an AutoDetectParser sent in via ParseContext won't bother to read
the InputStream, rather, it will "read" the table object and use the
user-supplied ContentHandler.
{quote}
B) Could it be better to send a xHTML inputStream with markup to client instead
of simple UTF-8 encoded CSV?
{quote}
We could, but there are other ways of getting that...RecursiveParserWrapper or
custom recursive embedded parser handler or even just sending in the plain
AutoDetectParser as the EmbeddedDocumentExtractor/Parser in ParseContext. The
idea behind this is to support a ParserContainerExtractor that would normally
pull just the bytes from embedded documents...because there are no bytes for a
table object (i.e. it never exists as an actual standalone file), I propose a
csv proxy.
{quote}
C) I agree, but it will work only if he adds the correct parser (eg TableParser
or CompositeParser) to ParseContext, right?
{quote}
The user will have to add an AutoDetectParser to the ParseContext, and we will
need to add org.apache.tika.parser.jdbc.SQLite3Parser
org.apache.tika.parser.jdbc.JDBCTableParser
to the parser services file.
I have a draft of this proposal working. The current downside is that if the
client resets and rereads the InputStream, the blobs/clobs are processed twice
via the EmbeddedDocumentExtractor.
Any problems with the above? Recommendations for an alternate design?
> Create a parser for SQLite3
> ---------------------------
>
> Key: TIKA-1511
> URL: https://issues.apache.org/jira/browse/TIKA-1511
> Project: Tika
> Issue Type: New Feature
> Components: parser
> Affects Versions: 1.6
> Reporter: Luis Filipe Nassif
> Fix For: 1.8
>
> Attachments: TIKA-1511v1.patch, TIKA-1511v2.patch, testSQLLite3b.db
>
>
> I think it would be very useful, as sqlite is used as data storage by a wide
> range of applications. Opening the ticket to track it.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)