[ 
https://issues.apache.org/jira/browse/TIKA-1731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14737990#comment-14737990
 ] 

mungeol heo commented on TIKA-1731:
-----------------------------------

{quote}did hwp ever go the ooxml route after its OLE phase{quote}

After a little search, I think it did.

{quote}does it diverge from standard ooxml at all{quote}

It supports microsoft OOXML(office open XML).
You can load OOXML document or store as OOXML format from HWP editor. (I am not 
sure whether this information helps)
For instance loading ms-doc file or store as ms-doc file.

{quote}can Tika+POI as they are handle it{quote}

I think so(?) since the author of java-hwp says he used apache POI's POIFS file 
system for handling compound file of HWP 5.0.


> Try to integrate java-hwp into Tika
> -----------------------------------
>
>                 Key: TIKA-1731
>                 URL: https://issues.apache.org/jira/browse/TIKA-1731
>             Project: Tika
>          Issue Type: New Feature
>            Reporter: Tim Allison
>            Priority: Minor
>
> Now that we have detection working for hwp files, it would be great to add a 
> parser.
> [java-hwp|https://github.com/ddoleye/java-hwp] looks like a promising 
> candidate.  We'd need to ask ddoleye about a potential change in license and 
> then interest in maintenance + pushing to maven.
> Any other candidates?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to