[
https://issues.apache.org/jira/browse/TIKA-2462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16308429#comment-16308429
]
Nick Burch commented on TIKA-2462:
----------------------------------
While we wait for the re-license to go through, I've had a look at writing a
parser. Outputting as CSV is very easy, as they've got a great class to do all
the work. SAX events of a HTML table will be trickier, as the logic to format a
raw value in a given column to "a string of how it looks in SAS" is currently
in a private method. I've raised [#24|https://github.com/epam/parso/issues/24]
to see if that can be refactored out, to avoid us needing to duplicate lots of
their code
Tika questions on column metadata, test files etc still remain for us though!
> Add a parser for sas7bdat
> -------------------------
>
> Key: TIKA-2462
> URL: https://issues.apache.org/jira/browse/TIKA-2462
> Project: Tika
> Issue Type: Improvement
> Reporter: Tim Allison
>
> EPAM recently agreed to migrate to Apache 2.0 so that we can incorporate
> parso into Tika for sas7bdat files: https://github.com/epam/parso/issues/19
> !!!
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)