[ 
https://issues.apache.org/jira/browse/TIKA-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Allison updated TIKA-1511:
------------------------------
    Attachment: TIKA-1511v1.patch
                testSQLLite3b.db

First draft of patch attached.  Need to build out tests, obviously, and I'll 
fix spelling of SQLLite in the class names! :)

For the design, I had to create a public parser that called a new *DBParser 
class for each call to parse (like many other parsers) to avoid thread safety 
issues. 

The *DBParser, in turn, calls the EmbeddedDocumentParser for each table, and it 
specifies via special mime-type, which *TableParser will be called. 

The *TableParser ignores the InputStream, and grabs the StatementTablePair from 
the ParseContext to parse each table.

The jdbc wrapper around sqlite is not able to read CLOBs (apparently?), 
although I could write them without exception (doesn't mean they were actually 
written), and it does some other stuff that is not standard JDBC, but that is 
all handled in SQLiteTableParser, a subclass of AbstractTableParser.

Any and all feedback is welcomed.  This is still drafty.


> Create a parser for SQLite3
> ---------------------------
>
>                 Key: TIKA-1511
>                 URL: https://issues.apache.org/jira/browse/TIKA-1511
>             Project: Tika
>          Issue Type: New Feature
>          Components: parser
>    Affects Versions: 1.6
>            Reporter: Luis Filipe Nassif
>             Fix For: 1.8
>
>         Attachments: TIKA-1511v1.patch, testSQLLite3b.db
>
>
> I think it would be very useful, as sqlite is used as data storage by a wide 
> range of applications. Opening the ticket to track it. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to