[
https://issues.apache.org/jira/browse/TIKA-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14280345#comment-14280345
]
Tim Allison edited comment on TIKA-1511 at 1/16/15 3:08 PM:
------------------------------------------------------------
First draft of patch attached. Need to build out tests, obviously, and I'll
fix spelling of SQLLite in the class names! :)
For the design, I created a public parser that called a new *DBParser class for
each call to parse (like many other parsers) to avoid thread safety issues.
The *DBParser, in turn, calls the EmbeddedDocumentExtractor for each table, and
it specifies via special mime-type, which *TableParser will be called.
The *TableParser ignores the empty InputStream, and grabs the
StatementTablePair from the ParseContext to parse each table.
Also, as part of the design, the EmbeddedDocumentExtractor is called for each
BLOB and each CLOB.
The jdbc wrapper around sqlite is not able to read CLOBs (apparently?),
although I could write them without exception (doesn't mean they were actually
written), and it does some other stuff that is not standard JDBC, but that is
all handled in SQLiteTableParser, a subclass of AbstractTableParser.
Any and all feedback is welcomed. This is still drafty.
was (Author: [email protected]):
First draft of patch attached. Need to build out tests, obviously, and I'll
fix spelling of SQLLite in the class names! :)
For the design, I had to create a public parser that called a new *DBParser
class for each call to parse (like many other parsers) to avoid thread safety
issues.
The *DBParser, in turn, calls the EmbeddedDocumentParser for each table, and it
specifies via special mime-type, which *TableParser will be called.
The *TableParser ignores the InputStream, and grabs the StatementTablePair from
the ParseContext to parse each table.
The jdbc wrapper around sqlite is not able to read CLOBs (apparently?),
although I could write them without exception (doesn't mean they were actually
written), and it does some other stuff that is not standard JDBC, but that is
all handled in SQLiteTableParser, a subclass of AbstractTableParser.
Any and all feedback is welcomed. This is still drafty.
> Create a parser for SQLite3
> ---------------------------
>
> Key: TIKA-1511
> URL: https://issues.apache.org/jira/browse/TIKA-1511
> Project: Tika
> Issue Type: New Feature
> Components: parser
> Affects Versions: 1.6
> Reporter: Luis Filipe Nassif
> Fix For: 1.8
>
> Attachments: TIKA-1511v1.patch, testSQLLite3b.db
>
>
> I think it would be very useful, as sqlite is used as data storage by a wide
> range of applications. Opening the ticket to track it.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)